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ABSTRACT 


A  desirable  feature  of  a  database  management  system  is 
the  ability  to  support  many  views  of  the  database  via 
several  user  nodels.  In  order  to  provide  this  support 
while  allowing  the  user  to  believe  that  his/her  view  and 
data  model  are  the  only  ones,  the  database  system  must 
have  a  nunber  of  facilities.  One  of  the  most  important 
of  these  is  a  mechanism  to  tell  when  view  constraints 
will  be  satisfied  given  that  the  underlying  database 
constraints  are  satisfied  so  that  the  user  always  sees 
what  is  expected. 

This  paDer  deals  with  a  particular  instance  of  this 
problem  where  the  constraints  are  functional  dependen¬ 
cies  and  the  views  are  created  through  relational  alge¬ 
bra  expressions  The  problem  immediately  reduces  to  the 
problem  of  calculating  all  valid  functional  dependencies 
(and  other  constraints)  on  a  relational  algebra  expres¬ 
sion  over  relations  in  the  base  schema.  The  problem  is 
undecidable  in  general  but  we  give  a  sound  and  complete 
algorithm  when  set  difference  is  omitted  from  relational 
algebr  a . 
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CALCULATING  CONSTRAINTS  ON  RELATIONAL  EXPRESSIONS 

A.  Kluq 


? 


1.  Introduct ion 

Database  manaqenent  systens  exist  to  (amonq  other  thinqs)  qive 
users  facilities  suited  to  their  needs,  privileqes  and  levels  of 
expertise.  One  tool  for  providinq  these  facilities  is  the  view 
(e.q.,  {ChGTl).  A  view  can  mask  off  parts  of  the  database;  it 
can  rearranqe  the  database's  structure,  and  it  can  accept  a  cus¬ 
tomized  set  of  operations.  In  qeneral,  a  view  can  tailor  the 
database  to  the  needs  and  privileqes  of  a  user. 

Ideally,  users  of  a  view  should  not  have  to  be  aware  of  the 
tact  that  they  are  interactinq  with  a  view  of  the  database 
rather  than  with  a  "qround  level"  database.  Now  schemas  qen- 
erally  contain  various  constraints  in  addition  to  object 
descriptions.  That  is,  the  schema  not  only  tells  the  user  what 
objects  are  available  and  in  what  relationships,  but  also  what 
restrictions  on  the  forn  and  structures  of  the  objects  hold. 

This  means  that,  if  the  user  need  not  be  aware  of  the  level  of 
indirection  introduced  by  the  view,  there  must  be  some  built-in 
database  mechanisn  which  can  ensure  that  these  view  constraints 
are  always  satisfied.  An  example  will  help  clarify  what  we 
mean : 

Suppose  the  underlyinq  (or  conceptual  level (ANSI])  database  is 
described  by  the  relational  schema: 

'a r- Part (CPI, Color, wt) 

Boat -Part (BP I , Col or ,Wt) . 

’lere,  C»'‘*  and  bpi  are  keys  of  t-ioir  rcfimect  i  ve  re  1  - 1 i ons . 

tonsor^f  Sy  "the  Pnited  States  Army  under  Contract  No.  PAAC29-H-C-0024  i"  This"  workT 
based  upon  work  supported  by  the  Computer  Science  Department ,  University  of  Wiscon^ir- 
Madison  and  the  National  Research  Council  of  Canada. 
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Suppose  also  that  some  user  wants  to  have  a  view  according  to 
the  schema 

Part (PI, wt) . 

(We  will  specify  below  how  this  view  is  to  be  derived  from  the 
base  relations.)  The  constraints  in  these  schemas  are  that  CPI, 
BPI  and  PI  are  keys  in  their  respective  relations. 

Suppose  the  Part  view  was  derived  by  the  Sequel (ABCEl  state¬ 
ment  : 

select  CPI ,Wt  f r on  Car-Part  where  color*' red*. 

Then  the  key  constraint  in  the  user  view  would  always  be  satis¬ 
fied  since  the  view  is  just  a  subset  of  Car-Part. 

Instead,  suppose  the  Part  view  were  equivalent  to  the  retrieval: 

(select  CPI  ,Wt  from  Car-Part) 
un  i  on 

( select  BPl.wt  from  Boat-Part) 

Here,  the  user  nay  fail  to  be  presented  with  a  view  in  which  PI 
is  a  key  of  the  Part  relation.  This  is  because  we  might  have  a 
Car-Part  with  the  same  nunber  as  a  Boat-Part,  but  the  two 
weights  nay  be  different. 

What  is  needed  is  a  nechanism  which  can  tell  that  the  first 
view  is  acceotahle  while  the  second  is  not.  More  generally,  a 
mechanism  is  needed  which,  given  any  underlying  schema,  any  view 
schema,  and  any  naoping  between  the  two,  can  decide  whether  or 
not  the  constraints  in  the  view  schema  will  always  be  satisfied 


(assuming  the  underlying  constraints  are  always  satisfied) 


terms  of  the  framework  proposed  by  the  ANSI/X3/SPARC  database 


study  grbup(ANSI),  we  are  recognizing  the  need  for  a  processing 


function  which  can  accept  or  reject  mappings  between  a  concep 


tual  schema  and  an  external  schema.  Note  that  correct  rejection 


is  just  as  important  as  correct  acceptance.  Correct  acceptance 


means  that  the  processor  would  not  accept  any  mappings  which 


would  cause  view  constraint  violations;  correct  acceptance  means 


that  the  processing  function  would  not  reject  any  mapping  which 
would  never  cause  any  constraint  violation.  Recognizing  this 
type  of  correctness  of  mappings  is  an  important  problem  and  one 


which  we  study  in  this  paper.  More  details  on  this  subject  may 


The  key  constraints  we  used  in  the  above  example  are  closely 


related  to  functional  dependencies.  Functional  dependency  con 


straints,  which  state  that  certain  attributes  are  in  functional 


tous  in  data  modelling.  Any  algorithms  for  the  problem  we  are 


discussing  nust  at  least  be  applicable  to  schemas  containing 


unctional  dependency  constraints 


In  the  example,  the  napping  of  the  view  to  the  base  schena  was 


made  via  the  Sequel  language.  For  purposes  of  the  presentation 


we  will  have  all  mappings  expressed  as  relational  algebra 


express i ons f Codd 1 .  Then  calculating  whether  or  not  some  func 


tional  dependency  on  a  relation  in  a  view  is  valid  is  the  sane 


as  calculating  whether  or  not  this  functional  dependency  is 


valid  on  the  relational  algebra  expression  corresponding  to  the 


view  relation.  Hence,  the  problem  we  consider  in  this  osner  is 
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the  following: 

Given  a  schema  s  containing  relations  and  functional  depen¬ 
dencies  and  an  arbitrary  relational  algebra  expression  e 
over  relations  in  s,  determine  exactly  the  set  of  valid 
functional  dependencies  on  e. 

In  the  next  section,  we  will  give  the  basic  definitions  to  be 
used  throughout  the  paper.  Section  3  will  formulate  the  algo¬ 
rithm,  and  Section  4  will  show  it  is  sound.  Then  in  Section  5 
we  accomplish  the  more  difficult  task  of  showing  that  the  algo¬ 
rithm  is  complete.  Finally,  in  Section  *>  we  draw  some  conclu¬ 
sions  . 

2  .  Detain  i  t  ions 

In  this  section  we  will  present  the  basic  concepts  of  schema, 
relational  algebra  expression,  structure,  state  and  validity. 

A  r e lat  ion  declarat ion  has  the  form  ' name (degree )' ;  for  exam¬ 
ple,  R(  10)  defines  a  relation  R  of  degree  10. 

A  flat  tonal  a lgebra  expression  consists  of  relation  names  and 
the  operators:  projection  restriction  •(.■.]',  selection 

*  t  .  a . ) *  ,  cross  product  'X*,  union  '.U.'  and  difference  The 

syntax  is  defined  by  the  following  BNFs: 

fll  e  name  I  etX)  I  efX-Yl  I  e(X«Vl  I 

exe  |  eUe  I  e-e 

fill  e  name  I  e(Xl  I  efX-Yl  I  efXaVl  I 


eXe  I  eUe 


(Ill)  e 


name  I  e,(XaVl 


We  will  see  later  why  we  have  four  variants  of  relational  alge 


bra.  Unless  otherwise  stated,  the  general  forn  (I)  is  under 


stood 


The  join  operator  is  a  derived  operator  in  this  franework  and 


is  defined  as  follows 


R[X«Y]S  is  (RXS)  ( Xs  Y '  )  ,  where  Y * -Y+deq ( R ) 


Intersection  can  also  be  defined  fron  the  given  operators 


A  functional  dependency  statement.  (FD)  has  the  form 
expr:d*->d',  where  exnr  is  a  relational  algebra  expression,  d* 


is  a  set  of  zero  or  more  domain  numbers,  and  d  is  a  domain 


number.  (We  number  relation  domains  rather  than  name  them  in 


order  to  facilitate  joining  relations  with  themselves.) 


A  schema  consists  of  a  set  of  relation  declarations  and  func 


tional  dependency  statements  on  the  relations  named  in  the 


schema 


A  structure  str  for  a  schema  s  consists  of  a  universe  ')  of 


data  elements,  for  each  constant  symbol  i  an  i nteroretat ion  c 


= 


7 


U,  and  for  each  relation  R(n)  in  the  schema  s  ,  a  set  R(str)  of 
n-tuples  over  U.  (The  constant  symbol  interpretations  allow 
arbitrary  data  universes  to  be  properly  modelled.) 

The  value  e(str)  of  a  relational  algebra  expression  e  in  a 
structure  str  is  defined  by  interpretat i nq  the  relational  alge¬ 
bra  operators  in  the  usual  manner: 

(1)  R(str)  is  “itself- 

(2)  e(X](str)  ■  (t(Xl  :  t  e  e(str)} 

(3)  e [ X*Y1  (str)  -  (t  :  t  f Xl -t { Y]  4  t  €  e(str)) 

(4)  e ( X*v) (str)  -  {t  :  t(X]-V  4  t  €  e(str)) 

(5)  (ejXej,)  (str)  ■  (tjXt2  »  tj  €  ej(str)  4  t2  €  e2(str)}, 
where  tjXt2  denotes  the  concatenation  of  tj  and  t2 

(*>)  (ejlJe2)(str)  ■  {t  :  t  6  ej(str)  or  t  6  e2(str)) 

(7)  |ej-e2)(str)  -  (t  :  t  f?  e^fstr)  4  t  0  e2(str)) 

A  functional  dependency  e:Z->A  is  true  in  a  structure  str  if 
for  every  pair  tj,  t2  of  tuples  in  e(str),  if  tj [ Z) »t2 f Z) ,  then 
t! (A) -t2(Al . 

A  structure  str  is  a  stat_e  with  respect  to  a  schema  s  if  every 
functional  dependency  in  s  is  true  in  str. 

A  functional  dependency  e:Z->A  is  vM|d  with  respect  to  a 
schena  s  if  e:Z->A  is  true  in  every  state  of  s. 

We  can  now  state  our  problem  precisely: 

Tiven  a  schena  s  and  a  relational  algebra  expression  e  over 
s,  determine  the  valid  Fbs  on  e. 


3 .  The  Algorithm 

This  section  will  present  rules  for  determining  the  valid  con¬ 
straints  on  a  relational  algebra  expression.  We  limit  our  study 
to  functional  dependencies  because  FDs  are  important,  familiar 
and  wel 1 -under  stood .  However,  because  the  relations  we  look  at 
are  actually  expressions  involving  restrictions  and  selections, 
it  will  be  necessary  to  define  two  additional  types  of  con¬ 
straints  which  will  reflect  the  structure  introduced  by  these 
operators.  We  will  call  these  constraints  equality  constraints 
(EQs)  . 

Consider  the  following  example: 

s:  R (  2)  ,  S(  2) 

R : l ->  2  ,  S: l->  2 
e:  R f  2- 1 1  S 

In  e  the  FDs  l->2  and  3->4  are  valid.  Because  domains  2  and  3, 
which  are  the  joined  domains,  are  always  equal  in  e,  we  can  fol¬ 
low  the  FD  l->2  and  then  3->4  to  get  a  valid  FD  ’->4  in  e.  In 

order  to  be  able  to  formally  derive  this  FD,  we  need  domain 
equal  i  t  ies  (DSPs)  as  a  constraint  type: 

A  domain  equality  has  the  form  e:X*Y.  It  is  true  in  a  struc¬ 
ture  str  if  t  f  X 1  ■  t  { Y 1  for  each  t  <?  e(str).  A  DFQ  e:X*Y  is  valid 

with  respect  to  a  schema  s  if  it  is  true  in  every  state  of  s. 


•lext  consider  the  example: 
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s:  R (  3)  ,  R:  1  ,2->3 
e:  R f  2- • 8 • 1 

In  this  example,  states  for  the  expression  e  will  always  have  a 
value  '8'  in  domain  2.  Thus,  given  a  value  for  domain  1,  the 
value  of  domain  2  is  determined,  and  therefore  domain  3  is 
determined  by  the  FD.  Hence,  the  FD  l->3  is  valid  in  e.  In 
order  to  be  able  to  formally  derive  this  FD,  we  need  value 
equalities  (VEQs)  as  a  constraint  type: 

A  value  equality  has  the  form  e;X-V.  It  is  true  in  a  structure 
str  if  t(Xl«V  for  every  t  6  e(str).  A  VEQ  e:XsV  is  valid  with 
respect  to  a  schema  s  if  it  is  true  in  every  state  of  s. 

We  now  proceed  to  derive  inference  rules  for  these  con¬ 
straints.  The  natural  way  to  develop  these  rules  is  in  two 
steps.  First,  we  will  need  rules  describing  the  derivation  of 
new  constraints  from  old  ones  on  the  same  expression.  For  exam¬ 
ple,  regardless  of  what  the  expression  is,  we  can  derive  l->3 
from  l->2  and  2->3.  The  second  group  of  rules  will  tell  how  to 
derive  constraints  on  an  expression  from  the  derivable  con¬ 
straints  on  its  subexpressions.  For  example,  from  R:2->3  we  can 
derive  l->2  on  the  projection  R[2,3). 

We  will  first  discuss  constraints  on  single  expressions. 

When  only  FDs  are  considered,  conplete  sets  of  rules  already 
exist(e.g.  (Arnsl).  The  following  three  rules  are  one  such  set: 

(All  X->  X  {  ref  lex  i  vi  ty) 

( A2 1  X->Z  h  XlJ  Y— >  Z  (augmentation) 

(A3)  x->Y,  YUZ->w  (-  XL|Z->W  (pseudotransitivity) 


(where  means  "derives") 


See  (Bern)  for  examples. 

Now  suppose  that  EQs  are  also  involved. 

The  set  of  rules  must  include  ones  reflectinq  the  basic  pro¬ 
perties  of  equality:  Everything  equals  itself;  hence,  X-X  will 
always  be  derivable.  The  constraint  X»Y  implies  the  constraint 
Y-X.  Also  X-Y  and  Y-Z  imply  X-Z. 

VEQs  also  interact  with  DEQs :  If  X«Y  and  YaV  are  con¬ 
straints,  then  the  X-domains  are  constant  and  also  equal  V, 
i.e.,  x a V  is  a  valid  constraint.  If  XaV  and  YaV  are  derivable, 
then  so  is  X-Y. 

FOs  also  interact  with  FDs .  If  X-Y  holds  and  X  appears  on 
the  left-  or  ri ght -hand-si de  of  an  FD,  then  the  FD  with  X 
replaced  by  Y  should  also  hold.  If  XhV  holds  and  X  appears  on 
the  left-hand-side  of  an  FD,  then  since  the  X-domain  is  con¬ 
stant,  we  can  drop  X  fron  the  FD. 

From  this  discussion  we  see  that  the  followinq  rules  are 
valid: 


(i) 

Z->  A , 

X-Y 

b 

(Z-X)  L|Y-> A 

fin 

Z-^A, 

A-B 

f- 

Z->  B 

fiiil 

Z-  >  A , 

XaV 

b 

(Z-X) ->  A 

Instead,  we  will  use  the  followinq  two  rules  which  are 
equivalent  to  fi)  -  fiiil: 

f  a  1  X-Y  b  X->Y 
f  bl  XaV  b  *->X 

with  these  two  rules,  we  can  dron  the  reflexive  rule  fA'1)  above, 
and  we  can  also  derive  rules  fil,  fill  and  fiiil  usinq  transi¬ 
tivity. 


There  is  another  kind  of  interaction  which  is  nore  difficult 


to  detect.  It  is  a  problem  which  could  not  occur  if  we  only 
considered  constraints  on  a  sinqle  relation,  and  it  is  to  this 
problem  that  much  of  this  paper  is  devoted.  To  illustrate,  con¬ 
sider  the  followinq  simple  example: 

s:  R(2),  R: 1 ->  ? 

e:  Rfl-1|R 

For  this  expression,  the  derivable  constraints  will  include  the 
FOs  l->2  and  3->4,  and  also  the  EO  1*3.  9ut  the  two  FDs  are 
really  the  "same*  since  they  arise  from  the  same  FD  on  the  same 
base  relation.  Because  their  domains  are  equated  by  the  EO, 
their  ranqes  nust  also  be  equal,  i.e.,  the  EO  2-4  must  also 
hold. 

In  more  qeneral  terms,  whenever  one  or  more  relations  appear 
as  components  on  both  sides  of  a  join,  some  of  the  FDs  on  the 
join  niqht  actually  be  the  "same". 

The  followinq  is  a  sliqhtly  more  complicated  example: 

s:  R( 3)  ,  S( 3)  ,  T (  3) 

R :  1  ->  2 

e:  (RflS)  (1,2)  [  1 -4 ){ S  f  2- 1 1  R) 

In  the  expression  e,  the  (same)  FO  on  R  appears  as  l->2  and  as 
l->7.  A  similar  effect  occurs  with  unions  which  will  be  illus¬ 
trated  shortly. 


To  tell  when  two  FDs  have  the  same  oriqins,  they  need  to  be 
qiven  labels  or  identifiers,  so  that  when  two  labels  are  equal 


(or  “essentially*  equal  as  explained  below),  the  two  FDs  are 


one  and  the  sane.  Mote  that  the  “origins"  of  an  FD  in  a  rela 


tional  expression  will  be  tree-like.  The  tree  structure  is  not 


only  due  to  the  axioms  for  FDs,  especially  pseudotransitivity 


but  also  to  the  tree-structure  of  the  relational  expressions 


themselves.  The  FD  identifiers  are  essentially  derivation 


The  next  examples  will  informally  describe  what  FD  identifiers 


look  like 


The  FD  l,3~>f  will  be  derivable  in  e  by  rules  we  will  present 


shortly.  To  see  what  the  label  of  this  FD  is,  we  will  make  an 


informal  derivation 


the  domain  names  of  S  have  been  shifted 


by  four  units  (the  length  of  R) .  So  the  FD  S:l->7  in  the  join 


can  be  represented  as 


where  the  arcs  and  —5  represent  the  relabelling.  (Tn  this 
md  in  the  following  examples,  the  roots  of  trees  will  be  either 


it  the  left  or  at  the  ton.)  The  FD  constraint  (which  gives 


an  FD  4  —  > S )  is  valid  in  the  join  R(4*11S  because  of  the  join 


4  to  the  above  FD  label 


ond i t ion 


obt a i n ing : 
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*— 2—  S— 1—  5— 4 

as  a  label  for  the  FD  4->A  which  is  valid  in  the  join.  The  FD 
R:2,3->4  is  not  renamed,  so  its  label  in  the  join  is  the  tree: 


The  FD  R : 1  —  > 2  combines  with  R:2,3->4  by  transitivity  yielding 
l,3->4,  whose  label  is 


We  again  apply  transitivity  to  derive  in  the  join  the  FD  l,3->4. 
Thus  the  FD  in  the  first  example  can  be  represented  by  the  fol¬ 
lowing  tree: 


When  selections  occur,  there  are  special  nodes  which  appear 
in  FO  labels.  For  example: 

s:  R (  3)  ,  R :  1  , 2->  3 
e:  R ( 2» ' fl '  1 

As  we  have  mentioned  before,  the  FD  l->3  is  valid  in  this 
expression.  Its  label  should  be  the  following: 


The  next  example  illustrates  the  case  when  the  label  may  have 


nore  than  one  leaf  with  the  same  domain  name 


e:  (R( 1-2) ) ( 1-21S 


In  e,  we  will  have  the  FD  l,3->4  which  results  from  the  FT) 


R: 1 ,2,3— >4  and  the  restriction  on  R  in  e.  Its  label  is  the  fol 


lowing 


The  F.Q  from  the  join-condition,  I-*',  will  combine  with  the  FD 


S : 1 - >  2 ,  which  in  the  join  is  $->*,  to  give  the  FD  5->l  with  the 


label 


Composing  the  FDs  l,l->4  and  $->l  in  e  to  produce  will 


correspond  to  merging  the  previous  two  trees  at  the  ’l*  nodes 


Since  there  are  two  terminal  •  l  •  nodes  in  the  tree  for  l,3->4 


two  copies  of  the  tree  for  S->1  must  be  joined 
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These  examples  suggest  the  general  approach:  An  FD  Z->A  on  a 
base  relation  R  is  represented  by  itself:  a  tree  whose  root  is 
A,  whose  leaves  ire  the  donains  in  Z  and  with  an  interior  node 
labelled  R.  Whenever  we  derive  a  new  FD  by  pseudotransitivity, 
we  "graft”  the  two  corresponding  labels  together  —  copies  of 
the  root  of  one  to  the  matching  leaves  of  the  other.  Whenever  a 
new  FD  is  derived  from  an  old  one  by  using  a  DEQ,  we  can  add  a 
new  label  to  the  appropriate  leaves  and/or  to  the  root.  The 
shrinking  of  the  le f t -hand-s i de  of  an  FD  by  application  of  a  VEO 
can  be  reflected  in  th  * ree  label  by  adding  to  the  appropriate 
leaves  a  terminal  node  labelled  with  the  VEQ's  value. 


The  purpose  of  the  FD  Identifiers  just  described  is  to  pro¬ 
vide  a  neans  to  tell  when  two  FDs  are  the  sane.  However,  the 
labels  need  not  be  identical  for  the  FDs  to  be  the  sane.  Being 
able  to  tell  when  two  identifiers  or  derivation  trees  represent 
the  sane  FD  is  an  inportant  part  of  our  algorithm.  Before 
presenting  the  rules  for  the  equivalence  of  FD  identifiers,  we 
will  give  sone  examples  of  FD  identifiers  being  the  same. 

First,  take  the  following  example: 

s:  R(2),  S(2) 

R :  l->  2 

e:  V  1-1! ( S  f 1 -  1 ) «)  . 


The  FDs  l->2  and  l->5  in  e  have  by  the  construction  the  respec- 


tive  labels: 


2 — R— 1  and  * — 4 — 2  — R — 1 — 3 — 5—3—1. 

The  two  FDs  are  the  sane,  although  their  identifiers  are  not. 
However,  the  identifiers  differ  only  in  the  "renaming"  of  the 
nodes.  This  example  generalizes  to  the  rule  that  if  any  two  FDs 
have  labels  which  differ  only  in  “renaming  segments",  then  they 
are  the  sane  FD. 

There  are  two  other  ways  in  which  FD  identifiers  may  differ 
while  still  describing  the  same  functional  dependency.  We  will 
give  two  examples  to  illustrate: 

s:  R { 4 ) ,  R: l , 3->4  ,  R:l->2,  R:3->2 
S ( 3) ,  S: 1  ,  2->  3 
e:  R( 2,4*2, 1 ) S 

In  e  there  is  the  FD  l,3->7  which  can  be  represented  by  either 
of  the  trees: 
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If  e  occurred  twice  within  a  larger  relation,  we  might  have 
derived  the  FD  in  the  two  occurrences  with  the  two  different 
identifiers.  We  would  have  to  know  that  these  identifiers 
represent  the  same  FD.  (These  identifiers  represent  the  same  FD 
because  the  part  R:l->2  will  determine  the  same  R-tuple  as 
R:3->2  due  to  the  presence  of  the  FD  R:l,3->4.) 

The  second  example  is: 

s:  R ( 4 ) ,  R: 1 , 2->  3 ,  R:l,4->2,  R:l,4->3 
e:  ( R  [  2*  2]  R)  ( 1 , Si ' 0 , 0  1  ] 

In  e,  the  FDs  4->7  and  4->3  can  be  associated,  respectively, 
with  the  trees: 

3 

I 

R 

Hi 

1  4 

I 

•  O’ 


We  must  be  able  to  “collapse"  the  identifier  on  the  left  and 
recognize  it  as  equalling  the  one  on  the  right.  (Composing  FDs 
across  joins  of  the  sane  relation  is  the  sane  as  composing  FDs 
within  a  single  relation.) 


Finally,  we  note  that  a  single  FD  may  need  to  have  several 
identifiers  associated  with  it.  If  an  expression  e  is  a  union 
el  U  e2,  and  an  FD  Z->A  is  valid  in  e,  then  we  will  need  to 
associate  one  or  more  identifiers  for  Z->A  from  ej  and  one  or 
more  from  e2.  A  more  detailed  example  will  be  qiven  in  Section 
4  . 

When  the  FD  e:Z->A  has  the  identifier  id  associated  with  it, 
we  will  write  the  association  as  e:id:Z->A  or  just  id:Z->A  and 
call  this  a  labelled  FD.  When  Z->A  has  several  identifiers 
associated  with  it2,  we  will  write  ( id j : i€S } : Z-> A  or  just 
{  idj  )  : Z->A. 

The  first  part  of  the  algorithm  for  calculating  constraints 
on  expressions  consists  of  a  procedure  for  generating  a  "free 
structure"^.  With  respect  to  two  FD  identifiers,  the  free 
structure  will  show  the  equivalence  or  non-equivalence  of  the 
identifiers. 

First  we  define  a  function  “gentab*  which  generates  (as 
side-effects)  tables  of  tuples  with  formal  values.  Then  a  func¬ 
tion  "infervals"  equates  values  according  to  the  FDs  present. 

If  the  appropriate  donains  of  the  two  tuples  corresponding  to 
the  roots  of  the  two  identifiers  id^ ,  id?  are  equated  then  the 
value  of  the  predicate  Eqv(idj,idj)  is  true,  otherwise  it  is 
false. 

'  The  need  for  several  identifiers  will  he  explained 

concept  is  similar  to  the  notion  of  Henkin 
i nt er oreta t i on f Henk 1  . 


shortly 
’  This 
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function  Eqv(idj,id2  :  FD-ident)  :  boolean; 
beg  i  n 

fstr  :■  empty;  /•  start  with  enpty  state  */ 
gentabi i d ^ )  ; 
gentab { id2)  ; 
infer va Is; 

if  idj(fstr;V)  *  id2(fstr;V) 
then  Eqv  true 
else  Eqv  :•  false 

end ; 

function  gentab(id  :  FD-ident)  :  formal-value; 
beg  i  n 

if  id  Is  a  domain  leaf  *n'  then  qentab  :*  *vn  /*n-th  formal  value*/ 
else  if  id  is  a  value  leaf  ' v '  then  qentab  :■  v 

else  if  succ(id)  is  a  donain  node  then  qentab  :*  gentab ( succ ( i d) ) 
else  /•  succ(id)  is  a  relation  name  R;  root(id)  is  a  domain  A,  and 
the  children  of  node  R  are  idj,...,idn  whose  roots  are 
labelled  Zj . . ,Zn  •/ 
begin 

add  new  tuple  t  to  R(fstr); 
for  i:*l  to  n  do  tfZjl  :■  gentab(idj); 
for  Y  not  in  fZj,...,Zn)  do  tfYl  :*  newformal  value; 
qentab  :»  tfAl 
end 


end ; 


procedure  intervals; 
beg  i  n 

changes  :»  true; 
whi le (changes)  do 
begin 

changes  :»  false; 

for  each  R  in  schena  do 

for  each  tj,  t ->  in  R(fstr)  do 

if  FD  R :  2  ->  A  and  and  tjfA]^t2fA] 

then  begin  chanqes  :»  true;  equate va 1 ( t y  [ A] , t 2 ( A] )  end 

end 

end ; 

procedure  equateva 1 ( x . y  :  fornal-value) ; 
beg  i  n 

for  each  R  in  schena  do 
for  each  t  in  R(fstr)  do 

for  each  donain  Z  do  if  t[Z]*x  then  tfZ]  :*  y 

end . 

The  predicate  Eqv  constitutes  one  part  of  our  algorithn  for 
deternining  valid  constraints  on  relational  algebra  expressions. 
The  next  definition  specifies  the  inference  rules  which  we  have 
already  infornally  discussed  for  deriving  constraints  on  a  sin¬ 
gle  expression.  This  constitutes  the  second  part  of  the  algo- 
r i thn: 

Let  s  be  a  schena;  let  e  he  an  expression  over  s.  The  rules 
for  deriving  new  EOs  and  labelled  FOs  fron  old  ones  on  e  are  as 

f  ol  lows'* : 
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(1) 

X-Y  h  Y 

-X 

[2) 

X* Y ,  Y- Z 

h 

X-Z 

[31 

X-Y.  Y-V 

1- 

X-V 

[41 

X-V,  Y-V 

1- 

X-Y 

151 

1-  X-X 

(*1 

X-Y  h 

id:  X 

->Y, 

where  id 

i  s 

X->Y 

(7) 

X-V  h 

*->X 

where  id 

i  s 

•v’->x 

(81 

(  idu}:Z 

->Ai 

#  { Id 2 j 

ArA2 

(91 

[idu):Z 

->A, 

(id2j) 

3i  j)sZL|X->B, 

where  id-j^  is  obtained  from  id2j  by  merging  a  copy  of  idj^ 

2j 


to  every  leaf  of  id,*  labelled  A. 


Any  expression  will  always  have  just  a  finite  number  of  deriv¬ 
able  constraints.  (There  are  only  a  finite  number  of  EQs  and 
FDs  on  an  expression  altogether.)  The  process  of  generating  all 
derivable  constraints  with  these  rules  from  a  given  set  of  con¬ 
straints  will  be  called  taking  the  closure  and  will  be  denoted 
•Cl'. 


Two  additional  conditions  are  used  in  the  closure  construc¬ 
tion: 

[a]  If  an  FD  is  already  present,  it  will  not  be  added  with 

another  identifier.  This  will  prevent  non-termination  from 


4  For  technical  reasons  which  will  become  apparent 
shortly,  we  have  not  explicitly  included  the  augmenta¬ 
tion  rule.  However,  our  notion  of  completeness  will  be 
appropriately  modified,  and  there  will  be  no  loss  of 
general i ty. 
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repeated  application  of  rules  (9)  with  FDs  from  rule  (ftl. 

(b)  Any  FDs  Z->A  such  that  an  FD  Zj->A  is  present  with  Zj  C  Z, 

Zj  Z,  are  renoved. 

We  have  specified  inference  rules  for  constraints  on  one 
expression  from  qiven  constraints.  Mow  we  want  to  give  rules 
for  generating  constraints  on  expressions  from  constraints 
derived  on  subexpressions.  First  we  will  informally  discuss  how 
this  should  be  done,  and  then  we  will  present  the  formal  rules: 

For  a  base  relation  R,  we  take  the  FDs  in  the  schema  belong¬ 
ing  to  R  and  label  them  by  themselves.  Then  the  closure  opera¬ 
tor  is  applied. 

For  projections,  there  are  two  special  points.  First,  in 
order  for  an  FD  or  an  EO  to  "survive"  a  projection,  all  of  the 
referenced  donains  must  be  included  in  the  projection.  Second, 
we  must  rename  the  domains  by  the  order  of  thei r  appearance  i  n 
the  projection  list.  Take  the  following  example: 

S:  R ( 10)  ,  R : 2 , 4  , S-> R 
•j:  R(2,4,S,R] 
e  2 !  «(  1  ,2,3,4  ,S,R) 
ej!  ?, S, 2, 4) 

e 4 :  Rf2,S,7,81 

In  the  projection  ej,  the  FD  appears  as  l,2f3->4.  In  e?,  the  FD 
is  2, 4, *>-><.  In  e-j,  it  is  2,4,S->1,  and  in  e4  ,  there  is  no  FD. 
If  C  is  the  set  of  derivable  constraints  for  some  expression  e, 
then  to  qet  the  set  of  derivable  constraints  for  a  projection 
efx],  we  need  to  find  all  the  constraints  in  C.  which  survive  the 
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projection,  hut  we  do  not  need  to  apply  the  closure  operator. 

The  reason  is  that  C  is  already  closed,  and  if  one  or  more  con¬ 
straints  in  C  survive  the  projection  and  also  combine  according 
to  one  of  the  rules  f 1 )  —  f 9 ?  defining  closure,  then  the  resulting 
constraint  will  also  survive  projection  because  every  domain  in 
the  result  appears  in  at  least  one  of  the  original  constraints. 
Nota t iona 1 ly ,  the  condition  for  Z->A  to  be  derivable  on  e(Xl  is 
that  X  (  Z  )  ->  X  (  A ]  be  derivable  on  e.  The  notation  XfZ]  is  like 
array  subscripting:  An  attribute  A  is  in  XfZ]  if  and  only  if 
there  is  some  j  G  Z  such  that  A  is  in  the  j-th  position  of  X. 

For  example,  if  x  is  2,8,4,*  and  Z  is  {2,4),  then  XfZ]  is  the 
set  { *  ,8  }  . 

In  cross  products,  renaming  also  occurs.  Given  relations 
R(n)  and  S(n),  the  domains  of  S  in  the  cross  product  RXS  have 
been  “shifted"  by  the  length  n  of  R.  That  is,  they  number  m+1 
through  nan.  All  constraints  of  S  are  valid  in  the  cross  pro¬ 
duct,  but  they  have  also  been  renamed  accordingly. 

For  a  restriction  efX»Y],  we  add  the  constraint  X-Y  to  the 
ones  already  holding  in  e  and  take  the  closure.  For  a  selection 
e[XaV],  we  similarly  add  XaV  to  the  constraints  of  e  and  take 
the  closure. 

For  a  union  e^U®?*  a  constraint  basically  nust  hold  in  both 
components  in  order  to  hold  in  e^LJej;  any  constraint  valid  on 
only  one  component  can  be  violated  by  tuples  in  the  other  com¬ 
ponent,  and  these  tuples  will  appear  in  the  union.  To  qet  the 
E^s  valid  in  e,(Je2,  we  simply  take  the  intersection  of  the  sets 
of  valid  Eos  for  ej  and  e^.  To  calculate  the  valid  FOs.  the  FD 
identifiers  nust  be  considered;  i.e.,  "intersection"  nust  make 


sure  that  the  FD  identifiers  are  Eqv.  The  next  example  will 
illustrate  this: 


s:  R (  2)  ,  S  (  2) 

R: l->  2 ,  S: 1 ->  2 
e  ^ :  R(J  S 

e2:  R ( 2- *  2  *  1  U  Rf2s*S'l 

In  e^  ,  the  FD  1 ->2  does  not  hold.  For  example,  in  the  state  st, 
where  R(st)  ■  ((0,0)}  and  S(st)  ■  ((0,1)},  the  FDs  on  the  base 
relations  are  true,  but  (R(JS)  (st)  *  (  (0 ,0)  ,  (0 , 1 )  }  ,  and  this 
violates  the  FD  l->2. 

Mow  consider  the  expression  e2.  Clearly  the  FD  l->2  will  be 
valid  in  this  expression  (since,  e.q.,  R  (  2m  '  2'  1  (JR  f  2-*  5 ' }  C  R)  . 
This  is  because  l->2  is  the  "same"  FD  in  each  component.  In 
qeneral,  for  an  FD  id:Z->A  to  appear  in  ej|Je2*  ^  nust  he  valid 
in  both  e,  and  e2  and  have  the  “same"  identifier  in  both  com¬ 
ponents. 

If  the  two  FDs  are  Eqv,  then  the  FD  is  in  the  union.  We 
must,  however,  attach  both  identifiers  to  the  FD.  The  followinq 
example  will  illustrate  why. 


(  ( ( R ( 4»4 1 R) f 5-51 R) (*-*] R)  [7+1- 21  + 1,7+ 2- 14  +  2, 14  +  3-21  +  31 
[7tl#7+2,l+3,0+7] 


( { R ( 4-4  ]  R)  f  5-5] R)  (A-*)R)  [  7+ 2- 14  + 2 , 7+ 2- 21  +  2 , 1 4  + 3- 21  «•  3  ] 
[7+1, 7+2, 14+3, 0+7] 


we  have  written  each  domain  in  the  restrictions  as 


a  displacement  plus  a  domain  of  R,  e.g 


14*2  denotes  the  second 


domain  of  the  third  copy  of  R  in  the  join.  In  each  of  ej ,  e2 
and  e,  the  FD  1 #  2  #  3— >  4  is  derivable  and  will  have  the  following 


respective  identifiers  (with  relabelling  nodes  removed) 


And  we  have  Eqv  ( i  d  ^  ,  i  d  2)  *  Eqv  ( i d 2  #  id -,)  but  not  Eqv(idj  i  d  2  >  • 
However,  on  e,-  this  FD  is  not  valid.  If  we  had  only  retained 


for  the  FD  in  e^,  then  we  would  have  incorrectly  derived  the 
n  •  , .  We  could  conceivably  have  a  procedure  to  decide  which 


identifier  should  be  retained,  but  it  is  simpler  just  to  keep 


then  all 
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The  constraints  derivable  on  a  set  difference  will  be  the 
constraints  derivable  on  the  first  component.  There  will  be  at 
least  that  many  derivable  constraints  because  the  difference  is 
a  subset  of  the  first  component.  More  will  be  said  about  set 
di f  f erence  later  . 

This  discussion  is  formalized  by  the  following  definition: 

Let  s  be  a  schema  and  e  an  expression  over  s.  The  set  Drv(e) 
of  derivable  constraints  on  e  is  defined  by  the  following  rules 
wnich  use  induction  on  the  number  of  operations  in  e  and  in 
which  “Cl"  denotes  the  closure  operator: 

(11  Drv(R)  —  Take  the  closure  of  R's  FDs  in  the  schema  and 
have  each  FD  R:Z->A  labelled  by  the  tree  consisting  of  a 
root  segment  R  ~A  (A  is  the  root)  and  an  arc  d —  into  node 
R  for  each  d  e  Z. 

(2)  Drv(etXl)  —  Take  all  DEQs  Y-Z  where  XfYl-XfZ]  is  in 
Orv(e),  all  VEQs  y*v  where  X[Y)aV  is  in  Drv(e),  and  all  FDs 
id^:Z->A  such  that  1 d j : X ( Zl -> X ( A)  is  in  Drv(e),  where  id? 
is  obtained  from  Id,  by  attaching  to  each  leaf  X(dl  of  idj 
the  arc  d— ,  and  to  the  root  XfAl  of  idj  the  arc  -A. 

(3)  Drv(efX-Y))  —  Add  X-Y  to  Drv(e)  and  take  the  closure. 

( 4 )  Drv (e (XaVl )  —  Add  XaV  to  Drv(e)  and  take  the  closure. 

(5)  DrvfejXe^)  Renane  the  constraints  in  Drvfe^l  according 
to  the  degree  of  e^,  i.e.,  A  Dm  X*Y  becomes  X*k*Y*K 
(k«degree (e^) ) ;  a  VF<)  becomes  X*k«v,  and  an  FD  idy:Z->A 
becomes  i d-, : Z*k -> A*k ,  where  id?  is  obtained  fron  id,  by 
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adding  an  arc  d  +  k —  to  each  leaf  labelled  d,  and  an  arc 
A*k  to  the  root.  Then  add  Drv(e2)  renamed  to  Drv(ep  and 
take  the  closure. 

[f]  Drv(ejU*->)  —  An  E9  X-Y  or  XaV  is  in  Drvte^lJe.,)  if  it  is  in 
both  Drv(e,)  and  Drv(e^).  If  |idji):Z->A  is  in  Drv(ej)  and 
(id2l):2->A  is  in  Drv(e,l  and  Vi,j  Eqv ( i d 1 j , id2 ^ )  ,  then 

f  id  .  ;  ,  id  7  J  :  Z->A  is  Drv(ejL|e2). 

(7]  Drv(«j-e  ,)  -  use  Drv(ej). 

we  will  say  that  e  is  con s i stent  (with  respect  to  s)  if  for 
no  domain  A  and  for  no  distinct  values  Vj ,  V2  are  both  XaVj  and 
XaV-,  nenbers  of  Drv(e).  It  is  meaninqless  to  try  to  derive  con¬ 
straints  on  expressions  which  are  not  consistent.  U 


Our  algorithm  now  consists  of  the  functions  Drv,  Cl  and  Eqv. 
With  these  rules,  a  solution  may  be  proposed  to  Problem  2  posed 
earlier: 


Proposition.  (liven  a  schema  s  and  an  expression  e,  an  SO  e:X«Y 
or  e:X*V  is  valid  if  and  only  if  it  is  in  Drv(e);  an  FD 
e:Z->A  is  valid  if  and  only  if  for  some  subset  Zj  C  Z, 
e:Zj->A  is  in  Drv(e)c’. 

The  verification  of  this  proposition  has  two  parts.  First, 
are  the  rules  sound?  That  is,  are  any  invalid  FOs  or  EOs  gen¬ 
erated?  Second,  are  the  rules  complete?  Do  they  find  all  of 

**  We  have  left  out  the  augmentation  rule,  but  this  no¬ 
tion  of  so undness/comnleteness  means  that  the  rule  can 
always  be  anolied  as  the  very  last  stem  in  a  deriva¬ 
tion. 


tne  valid  FDs  and  EQs?  The  question  of  soundness  is  considered 
in  the  next  section. 

4.  Soundness  of  the  Algor  i  thm 

The  fornal  development  of  the  operator  Drv  was  in  three 
parts:  First,  Eqv  was  defined;  then  we  defined  the  closure 

rules  using  Eqv,  and  finally  the  derivable  constraints  Drv  in 
terms  of  closure  and  Eqv.  To  prove  the  soundness  of  the  rules, 
we  first  prove  some  properties  of  Eqv  and  Cl  and  then  properties 
of  Drv. 

Functional  dependenc i es ,  as  their  name  implies,  are  closely 
related  to  ordinary  mathematical  functions.  If,  say,  in  R(n) 
there  is  the  FD  Z->A,  then  given  any  state  st,  the  projection 
RfZ.AMst)  defines  a  partial  function  U*->U,  where  k  is  the 
number  of  domains  in  Z  (and  U  is  the  data  universe  of  the 
state).  Similarly,  given  any  state,  FO  identifiers  define  par¬ 
tial  functions.  It  is  convenient  to  take  as  the  domain  of  these 
latter  functions  the  set  of  infinite  sequences  of  elements  of 
the  universe.  In  this  way,  all  functions  will  have  the  sane 
domain.  The  function  we  associate  with  an  identifier  will  then 
actually  depend  only  on  the  positions  in  the  sequence 
corresponding  to  the  integer  labels  of  the  leaves  of  id.  That 
is,  if,  for  example,  the  leaves  of  id  are  ?,S,7,  then  in  an 
argument  ( x ,,  x •>,...,  xn)  ,  only  values  x-,,  x^  and  x7  will  affect 
the  value  of  the  function.  r.iven  a  state  st,  we  will  denote  the 
function  determined  by  an  identifier  id  by  id(st)  or  id(st;  ). 
we  will  first  define  the  functions  determined  by  identifiers. 
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and  then  we  will  compare  these  functions  with  the  FDs  to  which 
the  FD  identifiers  are  attached. 

The  partial  funct ion  id(st)  whose  domain  is  the  set  of 
sequences  of  universe  elements  is  defined  by  the  following 
Algol-like  procedure,  where  x  is  a  sequence  of  domain  values: 

i  d  ( s  t ;  x )  » 

i f  id  is  a  domain  leaf  i  then  Xj 
else  U  Id  Is  a  value  leaf  'v'  then  v 
(•)  ej_se  i_f  succ(id)  is  an  integer  node  then  succ  ( i  d )  ( st  j  x ) 
else  /*  succ(id)  is  a  selation  name  Rj  root(id)  is  a 
domain  A,  and  the  set  of  children  of  R  is 
( id j  , . . .  ,  idn  )  whose  roots  are  labelled  Z1,...,Zn  V 
if  id1  (st ;x)  , . . . , idn«st;x)  are  defined 
then  beg i n 

VjC-idj ( s  t ; x ) ; 

•  •  e 

vn<-idn(st;x) > 

U  R[Zj  . ,Zn-v1 . vnl(Al(st)  -  (a) 

/*  i.e.,  if  there  is  a  tuple  in  R(st)  whose 
Z-values  equal  the  v’s  •/ 

then  a  /•  i.e.,  return  the  unique  A-value  •/ 
else  undefined 

end 

else  undefined  0 

Now  identifiers  were  not  associated  with  FDs  in  a  haphazard 


fashion.  They  were  chosen  so  that  the  function  determined  by 
the  identifier  essentially  equals  the  function  determined  by  the 
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FD.  To  decide,  as  in  rule  fB]  of  the  definition  of  closure,  if 
Aj*a,  should  be  inferred  from  idj :2  —  >Aj  and  id2:Z->A2  is  then  a 
question  of  deciding  if  id^(st)  ■  i d 2 ( s t )  for  all  states  st, 
i.e.,  if  the  functions  determined  by  the  identifiers  are  equal. 
We  are  qiven  as  an  hypothesis  of  rule  (81  in  the  definition  of 
closure  the  truth  of  Eqv ( i d j ,  id j) .  Hence,  we  need  to  know  that 
the  truth  of  Eqv  means  equality  of  the  identifier  functions. 

This  is  proved  in  the  following  theorem: 

Theorem  1.  Let  s  be  a  schema  and  suppose  id^  and  idj  are  asso¬ 
ciated  with  FDs  derived  on  expressions  over  s.  Then 
Eqv(idlfid2)  true  implies  that  for  all  states  st  and  value 
sequences  x,  if  id,(st;x)  and  idjfst.'X)  are  defined,  then  they 
are  equal . 

Proof.  The  proof  nay  be  found  in  the  appendix.  □ 

To  prove  completeness,  it  will  be  necessary  to  know  that  the 
converse  of  the  above  theorem  will  hold.  This  is  proved  in  the 
next  theorem. 

Theorem  2.  Let  s  be  a  schema  and  let  idj  and  i <3 ^  he  associated 
with  FDs  derived  on  exoressions  over  s.  Then  Eqvfidj.idp)  false 
i-plies  that  there  exists  a  state  st  and  a  value  sequence  x  such 
that  id^(st;x)  and  idjfst.-x)  are  defined  and  unequal. 

Proof.  This  is  immediate  from  the  definition  of  Eqv:  Take  st 
to  be  the  "free  state"  fstr  and  the  value  sequence  x  to  be  V.  □ 

We  have  claimed  that  the  function  determined  by  an  FD  Z->A 
defined  on  an  expression  e  is  essentially  the  sane  as  the  func¬ 
tion  determined  by  the  associated  identifier.  The  precise 


neaninq  of  “essentially"  is  the  following:  Suppose  a  state  st 
and  a  sequence  x  is  given.  The  values  of  x  which  are  relevant 
to  the  FD  are  x[Z).  The  application  of  the  function  to  the 
input  x  corresponds  to  the  selection  expression  efZ*x[Z]J.  To 
extract  the  A-value,  the  projection  e(Zax{Zll(A)  is  used.  If 
the  function  is  defined  at  st  and  x,  there  is  one  element  in 
e ( Zax  [  Z ] )  ( A] ( st )  ,  otherwise  it  is  empty.  The  “essential"  equal¬ 
ity  is  then  a  e  e [ Z-x  (  Z ) )  ( A)  ( st )  if  and  only  if  id(st;x)  is 
defined  and  equal  to  a  .  We  do  not  prove  this  equality 
separately  but  toqether  with  the  soundness  statement  itself 
below. 

with  the  above  constructions,  we  can  prove  the  soundness  of 
the  closure  operator.  We  hypothesize  two  properties  of  a  set  S 
of  constraints.  One  property  says  that  all  elements  of  S  are 
valid  constraints.  The  other  property  is  a  statement  that  FD 
identifiers  “agree"  with  the  FDs  as  discussed  above.  The 
theorem  then  states  that  the  closure  operator  preserves  these 
two  properties.  Of  course,  we  are  primarily  interested  in  the 
preservation  of  validity.  However,  the  second  property  is 
needed  during  the  induction  steps  and  for  the  completeness 
theorem. 

Theorem  3.  Let  s  be  a  schema  and  e  an  expression  over  s,  and 
consider  the  following  two  statements: 

(i)  every  constraint  in  the  set  is  valid; 

(ii)  for  every  FD  (idj):Z->A  in  the  set,  for  every  state  and 

Tnere  is  no  obvious  way  to  adjust  TO  identifiers  for 
an  augmentation  rule.  If  the  identifier  for,  say, 
znx->A  is  th*»  sane  as  for  Z->A,  then  the  above  rela- 
tlonshio  will  only  hold  in  the  "only  if"  direction. 


every  sequence  x  of  universe  elements,  a  6 


e [ Zsx ( Z] ] [ A) (st )  if  and  only  if  a  =  idj(st;x)  for  some 


Let  S  be  a  set  of  EQs  and  labelled  FDs  defined  on  an  expres 


sion  e  such  that  (i)  and  (ii)  hold  on  S.  Then  (i)  and  (ii)  hold 


on  Cl (S) 


Proof.  The  proof  uses  induction  on  the  length  of  the  derivation 


and  can  be  found  in  the  appendix.  P 


With  this  theorem,  it  is  now  easy  to  show  the  soundness  of 
:  (We  again  must  prove  the  extra  clause  (ii)  as  in  Theorem 


Theorem  4.  Let  s  be  a  schema,  and  let  e  be  an  expression  over 


s.  Then  statements  (i)  and  (ii)  of  Theorem  3  hold  on  the  set 


Drv (e ) 


Proof.  This  proof  uses  induction  on  the  number  of  relational 


algebra  operators  in  e  and  can  be  found  in  the  appendix.  P 


Corollary.  Let  s  be  a  schema  and  e  be  an  expression  over  s. 
Then  every  Eg  in  Drv(e)  is  valid,  antf  every  FD  e:Z->A  such  that 


Zi->A  is  in  Drv(e)  for  some  2,  C  Z  is  valid 


*> .  Completeness  of  the  Algor i thm 


We  now  know  that  Drv  does  not  generate  any  invalid  con 


straints.  The  next  step  in  a  solution  to  the  problem  of  calcu 


lating  constraints  on  relational  algebra  expressions  is  to  prove 
completeness:  Does  Drv  generate  all  of  the  valid  constraints? 
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As  we  have  defined  Drv,  the  answer  is  no,  Drv  does  not  generate 
all  valid  constraints.  The  following  example  illustrates  this: 

si:  R ( 2) ,  S ( 2 )  ,  R : l->  2 
e:  (RL)S)-(S-R) 

Drv(RUS)  -  * 

Drv(e)  -  4 

Clearly,  the  expression  (RUS)-(S-R)  is  equivalent  to  R,  and 
therefore  the  FD  l->2  is  valid  in  (RLJS)  -  (S-R)  .  However,  the 
rules  calculate  no  FDs  on  e.  It  seems,  then,  that  the  con¬ 
straint  formula  for  set  difference  must  have  some  knowledge  of 
when  formulas  are  equivalent  so  that  it  can  recognize,  at  least 
as  far  as  FDs  and  EQs  are  concerned,  (R|JS)-(S-R)  as  being 
equivalent  to  R.  It  turns  out,  however,  that  this  is  impossi¬ 
ble:  Functional  dependencies  are  sufficient  to  capture  the 

notion  of  equivalence  of  two  expressions,  and  this  problem  is 
undecidable . 

The  following  definition  specifies  what  we  mean  by 
equivalence,  and  the  next  theorem  shows  the  relationship  between 
FDs  and  the  equivalence  property: 

Let  s  be  a  schema,  and  let  ej  and  e2  be  expressions  over  s  of 
degree  n.  Then  e^  and  e2  are  equiva lent  (with  respect  to  s) , 
written  eise2,  if  for  every  state  st,  e^ (st) ae2 (st) .  II 

Theorem  _5.  Let  s  be  a  schema,  and  let  R  be  an  n-ary  relation 
over  s  such  that  for  every  donain  X  of  R,  the  FD  R:0->X  is  in 
s.  If  ej  and  e2  are  expressions  of  degree  n  which  do  not  con¬ 
tain  R,  then  e,ae2  if  and  only  if  every  possible  FD  is 
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valid  in  the  expression  E  *  RU(ei-e2*U(e2-ei* • 

Proof.  If  e^ae2*  then  it  is  immediate  that  for  every  state  st, 
E(st)  *  R(st),  and  so  every  FD  ^->X  is  valid  in  E.  Now  sup¬ 
pose  e^e2>  This  means  that  there  is  some  state  st  and  some 
tuple  t  such  that  either  t  6  e^fst)  and  t  fl  e2(st),  or  t  6 
e,(st)  and  t  0  e^(st).  In  either  case,  t  € 

(  (ej-e  ,)U(e2-ei)  )  (st>  •  Let  ^  *ny  tuple  not  in 

(ejUe2)(st)*  (If  (ejLJe  >)  (st)  *Un,  we  can  extend  the  universe  U 
by  adding  one  more  element  u,  and  we  can  then  construct  tuple  t* 
from  u.)  Because  R  does  not  appear  in  either  e ^  or  e2,  we  may 
assume  that  R(st)-(t*>.  Then  E(st)  contains  both  t  and  t'  and 
therefore  some  FD  rf->X  is  not  true  in  E(st).  II 

It  is  also  possible  to  prove  this  theorem  without  using  the 
new  relation  R,  that  is,  by  only  using  relations  appearing  in  ej 
and  e 2 . 

This  theorem  will  be  enough  to  show  the  impossibility  of  cal¬ 
culating  all  valid  FDs  on  arbitrary  relational  algebra  expres¬ 
sions.  This  is  because  the  problem  of  determining  the 
equivalence  of  two  relational  algebra  expressions  is  undecidable 
as  the  following  theorem  states: 

Theorem  *> .  Let  s  be  a  schema.  The  following  problem  is  unde¬ 
cidable  : 

Given  arbitrary  relational  expressions  e^  and  e2  over  s  of 

degree  n,  determine  if  e<xe2. 

Proof.  The  proof  is  based  on  a  similar  theorem  by 
N.  GolononfSolol  and  can  be  found  in  the  appendix.  II 
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With  this  undecidability  result,  we  can  prove  the  undecida¬ 
bility  of  calculating  constraints  on  expressions: 

Theoren  7.  The  problen  of  determining  all  valid  constraints  in 
a  given  relational  expression  of  syntax  (I)  over  a  given  schema 
is  undecidable. 

Proof.  If  there  were  a  procedure  for  calculating  all  con¬ 
straints  valid  in  a  relational  algebra  expression,  then  by 
Theoren  5  there  would  be  a  procedure  for  determining  the 
equivalence  of  relational  alqebra  expressions.  But  this  latter 
problen  is  undecidable  by  Theorem  5.  |j 

Given  this  undecidabi 1 i ty  result,  we  next  ask  what  changes 
can  be  made  to  the  napping  language  which  will  allow  a  decidable 
complete  set  of  derivation  rules. 

There  are  two  easy-to-speci fy  restrictions  which  can  be 
placed  on  the  relational  algebra  syntax  which  will  eliminate  the 
above  incompleteness.  One  restriction  is  to  allow  relation 
names  to  appear  at  most  once  in  an  expression.  Then  expressions 
such  as  (R(JS)-S  will  not  be  allowed,  and  the  constraints  valid 
on  a  set  difference  will  always  be  exactly  the  ones  valid  on  the 
first  component  of  the  difference. 

*inc  her  restriction  is  to  disallou  set  difference  as  a  rela¬ 
tional  algebra  operator.  without  set  difference,  we  cannot  per¬ 
form  the  reduction  of  Theoren  4,  and  the  complete  calculation  of 
constraints  on  relational  algebra  expressions  not  using  set 
difference  will  be  possible.  This  second  restriction  is  the  one 


we  shall  investigate. 


Let  us  consider,  then,  the  lanquaqe  of  syntax  (II)  in  which 
set  difference  has  been  omitted.  Does  completeness  hold  now? 
With  this  lanquaqe,  the  answer  is  still  no,  not  all  FDs  and  EQs 
are  detected. 

Before  we  qive  a  counterexample  (pointed  out  by  E.F.  Codd), 
we  will  briefly  discuss  certain  behavior  of  joins  of  projec¬ 
tions.  For  this  discussion  only,  we  will  use  different  notation 
which  will  allow  examples  of  such  joins  to  be  written  more  suc¬ 
cinctly.  The  example  relation  R  will  have  domains  naned  A,  B 
and  C.  A  join,  denoted  by  *W'  will  be  understood  to  be  an 
equi-join  on  the  like-named  domains  of  its  components,  and  it 
will  eliminate  one  of  the  joined  domains. 

Ordinarily,  a  join  of  projections  such  as  RfAClWRfRCl  will 
contain  all  the  tuples  in  R,  plus  extra  tuples,  i.e.,  R  C 
R(AC)MR(BC]  .  but  R  j*  R  f  AC  )WR  I BC]  .  Take  the  followinq  example: 


R 

R  ( AC] 

Rf  BC1 

RfAClWRfBCl 

A 

B 

C 

A  C 

B  C 

ABC 

0 

1 

2 

0  2 

1  2 

0  1  2 

3 

4 

2 

3  2 

4  2 

0  4  2 

3  1  2 
3  4  2 


If  enouqh  structure  is  added  to  R,  namely  in  the  form  of  FDs, 
then  no  extra  tuples  will  appear  in  the  join;  that  is,  the  join 
will  be  "non-loss".  In  our  example,  the  FD  C->B  is  sufficient 
to  cause  R  to  have  a  non-loss  join.  (The  above  example  violates 
the  FD  C->B.)  General  necessary  and  sufficient  conditions  for  a 
relation  to  have  a  non-loss  join  are  qiven  in  (AhBUI. 

Now  let  us  return  to  constructing  a  counterexample  to  the 
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completeness  problem.  We  use  the  relation  R  as  above  (returning 
to  the  usual  notation)  and  add  a  second  FD  1 , 2— > 3  to  the  FD  3->2 
already  mentioned.  We  have: 

s:  R (  3)  ,  R :  1  , 2->  3  ,  R:3->2 

Drv (R [ 1,31)  -  * 

Drv (R [ 2 , 3] )  -  ( 2-> 1 ,  1->1,  2-> 2 ,  1,2->1,  l,2->2) 

Drv(R(l,3] (2-2]R[2,3])  -  {4->3,  2->3,  1->1,  ...) 

Drv ( (R(l«3) ( 2-21 R[ 2 , 3) ) (1,3,4]  -  (3->2,  1->1,  ...) 

(The  ellipsis  represents  the  trivial  FDs  arising  from  the 
reflexive  rule.)  Yet,  we  know  that  for  any  state  st,  that 

R(st)  -  (Rf 1,31  r 2-21 R t 2,3] )  [1,3,41 (st) , 

i.e.,  that  R  has  a  non-loss  join.  (The  expression  on  the  right 
corresponds  to  R ( AC  1003  [ BC) . )  Therefore,  the  FD  l,2->3  is  valid 
in  the  expression,  but  it  is  not  calculated  by  Drv. 

Intuitively,  it  is  not  hard  to  see  why  this  happened.  The  FD 
l,2->3  was  lost  through  a  projection,  but  the  later  join  was 
non-loss  and  so  the  FD  reappeared.  This  problem  can  be  elim¬ 
inated  by  restricting  the  mapping  language.  The  problem  arose 
by  doing  projections  and  then  undoing  them  by  joins.  A  res¬ 
tricted  napping  language  which  does  all  joins  (formally,  cross 
products)  before  any  projections  will  be  just  as  powerful  as  the 
unrestricted  language  and  will  not  cause  Drv  to  lose  any  FDs  as 
happened  above.  This  is  what  the  third  version  of  the  rela¬ 
tional  algebra  syntax  does.  It  first  allows  cross  products  on 
base  relations  or  selections.  Then  restrictions  are  allowed, 
and  lastly,  union  and  projections  are  allowed.  (The  reason 


The  equivalences  of  Theoren  8  can  be  viewed  as  transformation 
rules:  Given  an  expression  e  according  to  syntax  (II),  natch 

some  part  of  e  with  the  le f t -hand-s i de  of  an  equivalence  of 
Theoren  8  to  produce  a  new  expression  e*  from  the  riqht-hand- 
side  of  the  equivalence.  Repeat  the  process  with  e‘.  Continue 

*  '  \  ._  V, 

as  long  as  possible.  The  result  will  be  an  expression  according 
to  syntax  (III)  which  is  equivalent  to  e.  We  record  this  result 
in  the  following  theoren. 

Theoren  9.  Kvery  expression  of  syntax  (II)  is  equivalent  to  an 

,l'  V-  '  . 

expression  of  syntax  (III).  11 

By  this  theorem,  we  are  justified  in  assuminq  that  all 
expressions  conform  to  syntax  III. 

Before  we  continue  with  the  completeness  problem,  let  us 
review  the  example  which  was  used  to  show  that  Drv  was  not  com¬ 
plete  for  syntax- ( I  I ) -expressions.  Recall  that  we  had  a  rela¬ 
tion  R(3)  with  FDs  1  , 2— >  3  and  3->2.  For  the  expression 

<R(1,3)  ( 2-21 R( 2,31)  fl.3,41  , 

we  calculated 


DrvURf;  31  ( 2"2 ) R[ 2 , 31 )  (1,3,41)- 
{ 3->  2 , 

In  actual  fact,  the  FD  l,2->3  is  also  valid  hut  is  not  calcu¬ 
lated  by  Orv.  Let  us  see  what  hanpens  hy  aoplyinq  the  above 
equivalence  t r ans f o rna t i ons .  One  of  the  two  possible  transfor¬ 


mations  is: 


(Rll.3H2-21.Rf2, 31)  (1,3,41  — >  (by  flal) 

(R(3-21R(2. 31)  Tl, 3,4, *>1  (1,3,41  —  >  (by  flbl) 

'  ' 

( R( 3-31 R) (1,2,3, 5.5)  (1,3,4,51 (1,3,4)  . 

/  •  *  '  V1  •’/ 

We  calculate  the  constraints  valid  in  this  last  expression  as 

f o 1  lows : 

Drv(R)  *  ( 1 , 2- >  3  3- >  2  ...) 

Drv  (R(  3-3)  R)  -  I  1 , 2-  >  3  3->  2  3-5  4  .  S->^  5->5  l,2->5 

5->2  4,S->2  4 , 5->  3  3->5  2-S  l,5->3 

>  S  yt 

2 , 4->5  1  ,5- >5  2 , 4- >  3  ...1 

-  >  v  ■»  , 

( 3— > 2  and  3->5  are  the  same  FD) 

-  A  % 

Orv((R( 3-3JR)  (1 ,2, 3,5,5 ) )  -  (  l,2->3  3->2  3-5  5->4  l,2->5 

5->  2  3->  4  2-4  1 , 4->  3  l,4->5 

.  V  ~  ‘  *  - 

,  3*3|R)  {  1  ,2,  3,5,  Mil  ,3,4,5)  )  -  (2-4  4->3  2->3  l,3->2 

1  '  /  ,  "  ”** 

l,3->4  ...1 

Orv((R(3-3)R) (1,2, 3, 5,5) (1,3, 4, 5) (1,3,4)  -{3->2  l,2->3  ...) 

^  "  v 

These  are  exactly  the  constraints  in  R  (to  which  the  expression 

is  equivalent).  The  counterexample  is  no  longer  a  problen,  and 

*  /  •  '• 

this  is  an  intuitive  argument  that  completeness  holds  for  the 

restricted  syntax. 

I 

In  the  above  discussions,  we  have  first  removed  set  differ¬ 
ence  from  the  nanpinq  language,  and  then  we  restricted  the  syn¬ 
tax  so  that  cross  products  are  performed  before  projections. 

With  these  restrictions  we  can  finally  prove  completeness  of  the 
derivation  rules.  We  begin  with  a  theorem  which  proves  com¬ 
pleteness  for  expressions  without  the  restriction  operator  (syn¬ 
tax  IV),  and  then  we  prove  it  for  any  expression  of  syntax  III. 
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Theorem  10.  Let  s  be  a  schema;  let  e  be  a  consistent  expression 
over  s  expressed  in  syntax  IV.  Then  any  valid  EQ  on  e  is  a 
member  of  Drv(e),  and  if  Z->A  is  valid  on  e,  then  for  some  Zj  C 
Z ,  Z  ^  ->  A  6  Drv(e)  . 

Proof.  Proofs  of  completeness  theorems  generally  follow  the 
contrapositive  direction:  If  c  is  not  a  member  of  Drv(e),  i.e., 
if  c  is  not  derivable  by  the  qiven  rules,  then  c  is  not  valid, 
i.e.,  there  is  some  state  st  such  that  c  is  false  in  e(st).  The 
state  st  is  called  a  counterexample  state.  In  the  proofs  of 
completeness  for  FDs  and  for  HVDs(BeFH),  the  counterexample 
state  is  a  single  set  of  tuples  since  only  one  relation  is  being 
dealt  with.  In  the  present  situation  there  are  (possibly)  many 
relations,  each  of  which  must  be  assigned  a  set  of  tuples  for 
the  counterexample  state.  This  cannot  be  done  in  as  straight¬ 
forward  a  fashion  as  in  the  case  for  one  relation  because  the 
expression  e  may  have  several  occurrences  of  the  same  base  rela¬ 
tion.  For  example,  suppose  e  has  the  form  ejXe2  and  that  base 
relation  R  appears  in  both  ej  and  ej.  If  we  try  to  arrive  at  an 
assignment  of  tuples  to  R  by  a  recursive  procedure  (which  is 
natural  since  relation  algebra  expressions  are  recursively 
defined),  the  procedure  call  on  e^  nay  result  in  an  assignment 
to  R  which  conflicts  with  that  of  the  procedure  call  on  ej>.  In 
the  proof  of  this  theorem,  which  can  be  found  in  the  appendix, 
we  are  able  to  overcome  this  problem  because  of  the  special 
structure  of  operations  embodied  in  syntax  IV.  We  first  use 
results  by  other  authors  for  the  case  of  base  relations  and 
selections  on  base  relations.  Then  for  cross  products  of  selec¬ 
tions,  we  show  how  counterexample  tuple  sets  on  subexpressions 
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can  be  integreted  into  the  whole  expression.  The  case  of  union 
ises  Theoren  2,  and  the  case  for  projection  is  straightforward. 


Now  we  want  to  use  this  theorem  to  prove  completeness  for 
expressions  of  syntax  III,  which  contain  restriction  operators. 
We  do  this  by  defining  a  transformation  from  expressions  of  syn¬ 
tax  III  to  those  of  syntax  IV;  that  is,  it  is  a  transformation 
which  removes  occurrences  of  the  restriction  operator.  This 
transformation  will  not  be  an  equivalence,  but  it  will  have  two 
properties  which  will  be  sufficient  for  our  purposes.  Namely, 
the  resultant  expression  will  (have  values  which  will)  always  be 
a  subset  of  the  original  expression,  and  the  resultant  expres¬ 
sion  will  have  the  sane  set  of  derivable  constraints  as  the  ori¬ 
ginal.  We  need  to  use  this  indirect  approach  because  if  we 
tried  a  direct  inductive  argument  for  a  restriction  e[X*Y)  we 
could  get  a  counterexample  state  on  e;  but  after  applying  the 

restriction  operator  (X«Y),  we  could  never  be  sure  that  the 

% 

tuples  were  still  present. 


Let  s  be  a  schema  and  let  e  be  an  expression  over  s  according 
to  syntax  III.  We  may  write  e  as  r ^ f X j 1 U • . . Ur n f Xnl ,  where  each 
Tf  is  a  restriction  of  a  cross  product  (see  the  definition  of 
syntax  III).  The  expression  G£e)  over  s  conforming  to  syntax  IV 
is  defined  as  follows: 

Let  r  be  one  of  the  terms  rj,...,rn,  and  let  X»Y  be  one  of 
the  restrictions  in  r.  Let  r*  be  r  with  the  restriction  X*Y 
removed.  Next,  there  are  three  cases:  X«V  *•  Drv(r*)  for  some 
V;  YaV  €  Drv(r#)  for  some  V;  and  no  VE1  on  X  or  Y  is  in  Orv(r*). 
In  the  fiist  case  replace  r  by  r*(YaVl;  in  the  second  case 
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replace  r  by  r*(XsVl,  and  in  the  third  case  replace  r  by 
r* fX-Vj )  { YsVj )  U  r*(X*V2)  [VaV2) ,  where  V and  neither  vi 
nor  V,  appear  in  e  or  in  s.  Sow  rearrange  the  resulting  expres¬ 
sion  so  that  it  is  again  of  the  form  r j ( X j 1 U. . .Ur n ( Xnl ,  where 
each  r:  is  a  restriction  of  a  cross  product.  The  new  expression 
will  have  one  less  restriction  than  the  original.  Repeat  this 
process  until  there  are  no  more  restrictions.  The  result  is 
G (e)  .  □ 

Theojren  11.  Let  s  be  a  schema,  and  let  e  be  an  expression  over 
s  of  syntax  III.  Then  Drv (G (e) ) »Drv (e)  ,  and  for  every  structure 
str,  G(e)(str)  C  e(str). 

Proof.  The  proof  nay  be  found  in  the  appendix.  □ 

The  goal  of  this  paper  now  appears  as  a  simple  corollary  of 
Theorem  1 1 : 

Theorem  1_2.  Let  s  be  a  schema,  and  let  e  be  a  consistent 
expression  over  s  expressed  in  syntax  III.  Then  any  valid  EQ  on 
e  is  a  member  of  Drv(e),  and  if  Z->A  is  valid  on  e,  then  for 
some  Zj  C  Z,  Zj->A  6  Drv(e). 

Proof.  Suppose  e  is  consistent  and  c  9  Drv(e).  Since  e  is 

consistent,  G(e)  is  consistent.  If  c  is  an  EQ  not  in 

Drv (e) -Drv (G(e) ) ,  or  if  it  is  an  FD  Z->A  such  that  Zj->A  9 

Dr v (e ) -Drv (G { e) )  for  every  Zj  C  Z,  then  by  Theoren  10,  there  is 

a  state  st  such  that  c  is  false  in  G(e)(st).  Since  G(e)(st)  C 

e(st),  c  is  also  false  in  e(st).  U 


This  paper  has  presented  results  about  calculating  functional 
dependencies  on  relational  algebra  expressions.  The  motivation 
for  studying  this  problem  arose  from  the  desire  to  be  able  to 
deternine  correct  view  mappings.  \  correct  view  mapping  is  one 
in  which  the  constraints  in  the  underlying  schema  and  the  pro¬ 
perties  of  the  mapping  ensure  that  the  constraints  in  the  view 
schema  are  always  satisfied  (in  the  view).  Given  the  underlying 
schema,  the  view  schena  and  the  mapping,  we  need  a  decision  pro¬ 
cedure  to  tell  us  the  correctness  or  incorrectness  of  the  map¬ 
ping.  This  is  a  basic  prerequisite  for  implementing  multilevel 
database  systems f ANSI ] . 

The  basic  facilities  of  view  mappings  can  be  nodelled  by  rela¬ 
tional  algebra,  and  a  basic  type  of  constraint  appearing  in 
schemas  is  the  functional  dependency;  hence,  the  problem  of 
recognizing  correct  view  mappings  is  in  large  part  the  problem 
of  calculating  the  valid  functional  dependencies  on  relational 
algebra  expressions  over  the  underlying  schema.  We  studied  this 
problem  in  detail  in  the  preceding  sections. 

The  results  were  both  negative  and  positive.  We  found  that 
the  problen  as  stated  is  simply  unsolvable.  If  we  drop  the  set 
difference  operator  fron  relational  algebra,  we  found  that  there 
is  an  algorithm  for  calculating  functional  dependencies  on  rela¬ 
tional  algebra  exnressions.  The  algorithm  nakes  use  of  the 
derivation  rules  already  known  for  functional  dependencies  on 
one  relation,  but  it  is  complicated  bv  the  need  to  compare 
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derivations  of  functional  dependencies.  With  suitable  modifica¬ 
tion  to  the  relational  alqebra  language  which  does  not  decrease 
its  power,  we  have  shown  that  the  algorithm  is  both  sound  —  no 
invalid  functional  dependencies  are  accepted  —  and  complete  — 
all  valid  functional  dependencies  are  found. 

In  terms  of  the  feasibility  of  constructing  the  mapping  pro¬ 
cessors  in  the  framework  of  the  ANSI /X 3/SPARC  Study  Group,  we 
nay  conclude  that  the  processors  can  in  principle  be  con¬ 
structed,  but  care  must  be  taken  in  the  design  of  the  mapping 
languaqe,  or  else  the  processors  cannot  exactly  determine  the 
correct  mappings. 

We  are  currently  working  on  a  number  of  logical  extensions  to 
this  work: 

Although  functional  dependencies  are  very  important  constraint 

\ 

types,  they  are  not  the  only  ones  which  have  a  wide  applicabil¬ 
ity.  For  example,  there  are  constraints  in  hierarchical  models 
which  require  that  each  child  segment  occurrence  have  a  parent. 
In  Codasyl  network  models,  a  similar  construct  is  that  a  record 
type  has  mandatory  membership  in  a  set  type.  In  relational 
models  we  have  what  are  called  foreign  key  constraints.  These 
are  all  cases  of  what  we  call  subset  constraints.  Their 
interaction  with  functional  dependencies  and  mappings  is  being 
i nvest iga  ted . 

In  this  paper  we  only  studied  structure  mappinqs.  A  complete 
view  specification  will  also  include  operation  mappings.  An 
operation  applied  to  the  view  must  be  mapped  to  operations  on 
the  underlying  relations  such  that  the  effect  will  be  as  if  the 
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8.  Append i x :  Proofs  of  Selected  Theorems 

Theorem  1.  Let  s  be  a  schema  and  suppose  idj  and  id2  are  asso¬ 
ciated  with  FDs  derived  on  expression  over  s.  Then  Eqv(idi,id2) 
true  implies  that  for  all  states  st  and  value  sequences  x,  if 
idj(st;x)  and  id2(st;x)  are  defined,  then  they  are  equal. 

Proof.  By  removing  unnecessary  tuples,  we  may  consider  the 
given  state  st  to  be  the  result  of  a  procedure  gentabl  in  which 
the  new  formalvalue  function  is  replaced  by  a  function  not 
necessarily  generating  unique  values.  Suppose  we  apply  gentabl 
to  idj  and  id2,  obtaining  a  structure  strl.  Clearly,  any  equal¬ 
ity  relationships  holding  in  the  free  structure  fstr  (before 
applying  the  intervals  procedure)  will  also  hold  on  the  analo¬ 
gous  tuples  in  strl.  Now  apply  intervals  to  strl  and  fstr. 

This  will  not  change  the  above  property:  Any  equality  holding  in 
fstr  will  still  hold  on  analogous  tuples  in  strl.  But  strl  is 
already  a  state,  hence  id^(fstr,V)  »  id2(fstr,V)  implies 
id^fstrljx)  »  id2(strl;x),  i.e.,  id}(st;x)  *  id2(st;x).  P 

Theorem  3.  Let  s  be  a  schena  and  e  be  an  expression  over  s,  and 
consider  the  following  two  statements: 

(i)  Every  constraint  in  the  set  is  valid; 

(ii)  for  every  FD  fidj):Z->A  in  the  set,  for  every  state  st 
and  every  sequence  x  of  universe  elements,  a  6 

e ( Z»x (Zll (A) ( st)  if  and  only  if  for  some  i,  id^(st;x)«a. 
Let  S  be  a  set  of  EQs  and  labelled  FOs  defined  on  an  expres¬ 
sion  e  such  that  (i)  and  (ii)  hold  on  S.  Then  (i)  and  (ii)  hold 
on  Cl (S) . 


so 


Proof.  We  assume  (i)  and  (ii)  hold  on  S. 

If  c  6  Cl (S)  is  present  by  one  of  clauses  fl]  through  [5], 
then  property  (i)  holds  because  of  the  basic  properties  of 
equa 1 i ty . 

If  id : X- >Y  €  C 1  ( S )  because  X-Y  €  Cl  (S)  (clause  (M)  ,  then 

X- > Y  is  valid  because  equality  is  functional  (and  valid  by  the 

induction  hypothesis).  Also,  for  any  state  st  and  sequence  x,  a 
6  e[Xsx[X) ] (Y) (st)  if  and  only  if  for  some  t  €  e(st),  x[X]  - 
t  f  X I  ■  t ( Y 1  *  a;  i.e.,  if  and  only  if  a  *  id(st;x). 

If  id :  — >  X  €  Cl  (S)  because  XaV  €  Cl(S)  (clause  [7]),  then 

4->X  is  valid  because  the  X  domain  is  constant.  Also,  for  any 
state  st  and  sequence  x,  a  6  e (#ax (tfl ) (X] (st )  ®  e[Xj(st),  if 
and  only  if  a  -  V  ■  id(stjx). 

Suppose  that  (idli):Z->Ai  and  (id2j|:Z->A2  are  in  C1(S)  and 
that  statement  (ii)  above  is  true  for  them  and  also  that 
Eqv( id1 j, i d 2 j )  *  true  for  all  i,j.  Then  for  every  state  st  and 
any  t  6  e(st),  t(Ajl  C  e ( Zit ( Z I) ( Aj ) ( st ) ,  so  for  some  i, 
t (Aj ) *idj j (st; t) .  Similarly,  for  some  j,  t (A2I ■ id2 j (st; t) . 

Hence  t(Aj)»t(A-,l  and  so  Aj=A2  is  valid  for  e. 

Suppose  idli:Z->A  and  i  d  2  j  :  A{J  X  -  >  B  are  in  Cl(S)  and  satisfy 
(i)  and  (ii),  and  that  id  j  j  ^ :  ZL|X->B  is  obtained  per  clause  (91. 
By  inspection  of  the  algorithm,  we  can  see  that  idjjj(st)(x)  * 
id2 j (st) (x (idj j (st) (x)/Al )  (where  x[y/kl  denotes  the  sequence 
obtained  from  x  by  puttinq  y  at  position  k).  By  definition,  b  6 
e  ( ZLJXax  ( ZUXl  1  (A)  (st)  if  and  only  if  there  is  a  t  6  e(st)  such 
that  t  [  B 1  *b  and  t  (  Z|JX )  *x  f  ZU*1  •  This  is  true  if  and  only  if 
there  is  a  t  6  e(st)  such  that  tf«)*h  and  t  f  ALIX )  * 
x  (t  (AI/A1  (ALlXl  and  tfZl-xfZl.  By  induction,  we  know  that  b  € 
e  f  A(jX*x  f  t  f  A) /Al  f  AL/X  1  1  f  R?  (  st )  if  and  only  if  b  » 
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i d 2 j (st;x(t [Al/AI ) ,  and  that  t { A 1  6  e ( Zsx  ( Z ] ]  [ A) ( s t )  if  and  only 
if  t  { A 1  =  i  d  j  j  ( s  t ;  x )  .  Hence  b  €  e  {  ZUXsx  [  Z|JX  ]  1  [  Bl  (  st )  if  and 
only  if  b  »  i d 2 j (stjx[t(Aj/A) )  *  id3ij(st;x).  U 

Theorem  4.  Let  s  be  a  schema,  and  let  e  be  an  expression  over 
s.  Then  statements  (i)  and  ( i i )  of  Theorem  3  hold  on  the  set 
Drv (e)  . 

Proof.  This  proof  is  quite  st ra iqht forward ,  with  most  of  the 
work  having  been  done  in  the  previous  theorem.  We  will  only 
show  how  clause  (ii)  proceeds  for  projection  and  how  clause  (i) 
proceeds  for  union: 

Suppose  id  2 : Z->  A  €  Drv(e[Xl),  where  id j : X  I Z 1 ->X [ A1  is  in 
Drv{e)  and  i d 2  is  as  described  in  clause  (2].  First  note  that 
id2(st){x)  »  idj(st)(x'),  where  x  '  =*x  (  x  (  Z 1  /X  [  Z  ]  ]  .  Then  a  6 

e { X J  [ 2*x ( Z ] ]( A) ( st )  if  and  only  if  there  is  a  t  6  e(st)  such 

that  a-  t { X 1  ( A 1  -  t[X(A))  and  t [ X [ Z ) ]  *  t ( X 1  ( Z 1  =  x(Z]  * 

x‘ [ X ( Z ) 1  ,  i.e.,  if  and  only  if  a  6  e ( X { 21 *x • (X f Zl] I [X [ All (st ) , 
which  means,  by  induction,  that  a  »  idj(st)(x')  »  id2(st> (x) . 

For  union,  suppose  e  is  and  that  {  i  d  j  j  ,  id  2  j  }  :  Z->  A  € 

Drv(e),  where  (idli):Z->A  €  Drvie^),  and  {id2j):Z->A  e  Drv(e2) 
and  Eqv ( id , j ,  id p j )  Vi,j.  Suppose  t,  t'  are  in  e(st)  and  that 
t [ Z 1  * t ' f  Z )  .  There  is  some  idkn  (k*l  or  2)  such  that 
t  ( Al  ■id|(n(st ;  t)  and  there  is  some  idjn  (1*1  or  2)  such  that 
t' (A1 ■idin(st;t) .  Regardless  of  the  values  of  k  and  1,  we  have 
Eqv( idkn, idln) ,  and  so  t(Al«t'(A].  □ 

Theorem  5.  Let  s  be  a  schema.  The  following  oroblen  is  unde- 


cldable: 


Given  arbitrary  relational  expressions  ej  and  e?  over  s 
of  deqree  n,  determine  if  ejae2. 

Proof.  Let  us  say  that  two  expressions  ej  and  e2  are  strongly 
equivalent  (with  respect  to  some  schema  s)  if  e} ( st r ) =e2 (st r ) 
for  all  structures  str  for  s.  In  [Solol  it  is  proved  that  the 
strong  equivalence  problem  is  undecidable  when  the  expressions 
do  not  contain  selections.  Clearly,  the  problem  is  still  unde¬ 
cidable  when  selections  can  also  appear.  From  this  we  conclude 
that  equivalence  problem  (i)  is  undecidable  since  an  algorithm 
for  it  would  still  work  when  the  schema  contains  no  constraints. 

As  an  aside,  we  will  show  that  we  can  also  reduce  equivalence 
problem  (i)  to  the  strong  equivalence  problem: 

First  we  show  how  to  express  constraints  as  relational  alge¬ 
bra  equalities:  An  FD  R:Z->A  is  true  in  a  structure  str  if  and 
only  if  (R[Z-Z] R) [A*A* 1 (str)  *  ( R ( Z-Z 1 R) ( st r )  ,  where 
A 1 =A*deg ( R) ;  a  DEO  R:XaY  is  true  in  str  if  and  only  if 
R(X*Y)(str)  -  R(str),  and  a  VEO  R:X*V  is  true  in  str  if  and  only 
if  R ( X*V] (str)  -  R(str) . 

Given  a  schema  s,  let  L  be  the  cross  product  of  the  left- 
hand-sides  of  every  equality  as  above  corresponding  to  each  con¬ 
straint  in  s,  and  let  R  be  the  cross  product  of  the  correspond¬ 
ing  right-hand-sides.  Note  that  L ( st r ) *R ( st r )  if  and  only  if 
str  is  a  state,  and  that  L(str)  C  R(str)  for  all  structures  str. 
Now  suppose  that  the  relations  in  s  are  Rj,...,Rn.  define  the 
expression  W  *  R j  ( 1 HJ*  •  *lJRn ( 11  •  Then  W  has  degree  1,  and 
W(str)*4  if  and  only  if  R(str)««(  for  each  R  €  s.  Now  define 
E  »  W-(WX  (R-L) ) [11 .  This  expression  has  the  property  that 
Efstr)1*^  if  and  only  if  str  is  not  a  state  or  str  is  enoty. 
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We  now  have  the  reduction:  e^-ej  if  and  only  if  e^XE  is  strongly 
equivalent  to  e^XE.  □ 

Theorem  H).  Let  s  be  a  schema;  let  e  be  an  expression  over  s 
according  to  syntax  IV  be  consistent.  Then  any  valid  EQ  on  e  is 
a  member  of  Drv(e),  and  if  Z->A  is  valid  on  e,  then  for  some  Zj 
C  Z,  Z1->A  6  Drv(e) . 

Proof.  First  consider  a  selection  e«R ( XjeV j 1 . . . [ Xm«Vml ,  where 
Xj.  Vj  are  sinqle  domains  or  values,  r espect i vi 1 y .  We  first 
define  a  set  of  equivalence  classes  of  domains  {E$  :  l<i£n) 
determined  by  the  EQ  constraints  on  e.  This  will  allow  us  to 
remove  EQs  from  consideration  and  to  utilize  existing  theorems 
on  the  completeness  of  rules  for  FDs. 

For  each  set  Ej  we  associate  a  value  Vj:  If  X  €  Ej  and  XaV  € 
Drv(R),  then  Vj«V.  (This  value  will  be  unique  if  it  exists 
since  R  is  consistent.)  Otherwise,  Vj  is  an  arbitrary,  unique 
inteqer.  (Hence,  lf*j  implies  Vj^Vj.)  We  first  define  a 
state  st1  such  that  e(stj)  contains  only  one  tuple  t.  Namely,  t 
is  such  that  t{Xl*V.  if  and  only  if  X  e  Ej.  Note  that  since 
R(stj)  has  a  cardinality  of  one,  all  FDs  are  true  in  R(stj). 

Now,  if  x*Y  is  not  derivable,  then  for  some  i  and  j,  X  €  Ej,  Y  € 
Ej,  and  i^j.  Hence  (in  e(stj)),  t(Xl*Vj,  tfY)«Vj,  and  so 
t(X)^t(Y).  The  state  stj  is  therefore  a  counterexample  state 
showing  that  e:X«Y  is  not  valid. 

Similarly,  if  XaV  is  not  derivable,  then  Vj>*V,  where  X  €  Ej 
and  t(Xl*Vj,  t(X)^V,  again  showing  that  stj  is  a  counterexam¬ 
ple  to  the  validity  of  e:X*V. 
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Now  consider  an  FD  X->Y  such  that  for  no  Xj  C  X  is  Xj->Y  € 
Drv(e).  Using  the  sets  (Ej  :  1 < i < n >  defined  above  we  define  a 
function  q  which  naps  FDs  on  e  to  FDs  on  an  n-ary  relation  R*  . 
First,  we  write  g(X)«{i}  if  X  €  Ej  and  Vj  was  an  arbitrary 
value;  otherwise  g(X)»4  (where  Vj  was  assigned  because  XaVj 
was  derivable).  Then  for  a  set  Z*(Zj...Zn)  of  domains,  g(Z)  * 
g  (ZpU*  •  ‘Ul  (zn)  »  and  finally,  g(Z->A)  »  g(Z)->g(A).  We  let  s' 
be  the  schema  consisting  of  R*  and  also  every  FD  g(Z->A)  where 
R : Z->A  €  s.  First  we  note  that  g(X)->g(Y)  is  not  derivable  in 
s',  i.e.,  there  is  no  X'  C  g(X)  such  that  X'->q(Y)  €  Drv(R’). 

( X- > Y  is  the  given  non-der i vable  FD.)  Since  the  reflexive  and 
pseudotransitivity  rules  are  complete  (according  to  our 
interpretation  which  incorporates  augmentation)  for  FDs  on  one 
relation  (with  no  EQs)(Arns),  there  exists  a  state  R’(stp)  such 
that  g(X)->g(Y)  is  false  in  R ’ ( st 2) •  We  now  use  g  to  construct 
a  counterexample  state  st ^  for  R:X->Y.  To  do  this  we  define  a 
function  h,  an  inverse  of  g,  which,  from  tuples  of  R'tstj)#  will 
yield  tuples  of  R(stj).  Namely,  if  t'  6  R'(st2),  then  h(t’)  is 
the  tuple  t  such  that  t(Xl-V  if  X*V  is  derivable,  and  tfXl-t'fil 
when  Vj  was  arbitrary  and  X  €  Ej.  Then  we  let  R(stj)  * 
h(R'(st2)).  It  is  not  hard  to  see  that  all  constraints  in  s  are 
true  in  R(stj)  and  that  X->Y  is  false  in  R(stj). 

We  next  assume  that  the  exoression  e  is  a  cross  product  of 
selections  or  other  cross  products.  That  is,  e  is  of  the  form 
Rj  f X 1  * V 1 1  x.  .  .xRn(Xn2Vn1  ,  where  each  Xj  and  Vj  mav  be  a  list. 
Suppose  that  Z->A  is  such  that  Zj->A  0  Drv(e)  for  every  Zj  C 
7. .  Let  Zj  C  Z  be  the  domains  of  e  appearing  in  the  sane  selec¬ 
tion  tern  that  A  appears  in.  (Z,  may  be  empty.)  Let  us  also 
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assume  that  this  term  is  first  in  e,  i.e.,  that  we  can  write  e 
as  R(X1aV1lXe2.  Then  for  no  Z2  ?  Zj  is  Z2->A  €  DrviRiZjaVj]) 
for  otherwise  we  would  have  Z,->A  €  Drv(e).  By  the  induction 
hypothesis,  there  is  a  state  stj  such  that  Zj->A  is  false  in 
R l X j *V j ] ( st j ) .  If  e7(stj)  (the  remainder  of  e)  is  nonempty, 
then  Z->A  will  also  be  false  in  e(stj),  for  if  tj  and  t2  are 
tuples  of  R ( X j #V j ] ( st j >  contradicting  Zj->A  and  if  t'  is  any 
tuple  of  e7(st2),  then  tjXt'  and  tjXt1  are  tuples  in  e(st^)  con¬ 
tradicting  Z- > A .  It  renains  to  show  that  if  e2(st^) *0,  that 
we  can  modify  the  state  to  qet  e2(stj)  nonempty  while  retaininq 
tuples  in  R[XjaVj)  contradicting  Zj->A. 

First  consider  the  subexpession  r  of  e  consisting  of  all  the 
selections  on  the  base  relation  R:  r  ■  R  (  X  j  sV  j  1  X  .  .  .  XR  [  XnaVnl  . 
(The  Xs  and  Vs  have  been  renumbered  and  any  selection  tern  not 
in  r  is  on  a  base  relation  other  than  R.)  We  will  write  this  as 
SjX...xsn.  Before  we  proceed  to  modify  stj  so  that  r  is 
nonempty,  we  prove  a  lemma.  Note  that  every  domain  Y  in  r  can 
be  written  dj*x  where  l<X<deg(R)  and  dj  (a  "displacement") 
equals  (i-l)deg(R),  where  Y  is  a  domain  in  the  term  Sj.  Then  we 
have : 

(i)  if  dj+x*d^*Y  €  Drv(r),  if1),  X  is  different  from  Y, 
and  no  VEO  d^XaV  or  d^YaV  is  derivable  in  r,  then 
dj+x»d^X  and  dj*Y*d^*Y  are  in  Drv(r),  and  X*Y  is  in 

Orv(R) ; 

(ii)  if  dj*X*V  (•  Drv(r),  then  ef - > x  f  Drv(Sj),  and 

(iii)  if  d  j  *X*d  ^  *X  6  Orv(r),  then  -  >  X  Drv  ( s  j )  flDrv  ( s  ^ )  . 


This  lenna  describes  the  kind  of  F9s  which  can  hold  on  a 


cross  product  of  selection  terns,  each  tern  on  the  same  base 
relation.  A  formal  proof  of  the  properties  is  qiven,  but  first 
we  will  <jive  an  intuitive  discussion. 

i 

Consider  a  cross  product  r  ■  SjX...Xsn,  where  each  Sj  has 
the  form  R(X^aVj),  with  Xj.Vj  lists. 

An  EQ  within  a  sinqle  term,  e.q.,  dj*X«dj+Y  could  be  the 
result  of  the  selections  on  that  tern:  dj*XsV  and  d^+Y«V  are  in 
Drv(Sf).  If  this  is  not  the  case,  then  the  E  must  be  the  result 
of  the  EQ  X-Y  which  must  be  in  Drv(R).  This  is  because  the  only 
linkinq  from  one  term  to  another  in  r  is  throuqh  VEQs  with  the 
same  value.  The  transitivity  rule  nay  have  been  used  in  the 
derivation  of  dj*X«dj>Y,  but  eventually  we  will  arrive  at  the 
above  result. 

Consider  an  EQ  dj+X-d^+X,  an  equality  of  cor respondinq 
domains  in  different  terms.  As  above,  this  equality  could  be 
the  result  of  VEQs  dj*X«V  in  Drv(Sj)  and  dj*X*V  in  Drv(Sj).  If 
this  is  not  the  case,  it  must  (eventually)  be  the  result  of  an 
FD  Z->X  and  VEQs  dj+ZaV  in  Drv(sj)  and  dj+ZaV  in  Orv(Sj).  By 
composinq,  we  qet  rf->dj*X  in  Drv  ( s  j )  (iDr  v  ( s  ^ )  .  That  is,  if  an 
EQ  is  derived  by  equivalent  FDs,  the  left-hand-sides  of  the  FDs 
must  be  equal  because  of  VEQs  and  this  means  that  the  domains 

Consider  an  EQ  dj+X^d^Y  with  i^j  and  X^Y.  Aqain,  this 
EQ  nay  be  the  innediate  result  of  VEQs.  If  not,  it  will  be  the 
result  of  the  previous  two  cases  usinq  transitivity.  That  is, 
dj*X«dj*Y  will  be  derivable  in  r,  and  X*Y  will  be  derivable  in 
H.  We  can  further  derive  dj*Y«d.j*Y  in  r. 


Finally,  consider  a  VEQ  dj+XsV  in  Drv(r).  If  this  VEQ  is  also 
in  Drv(sj),  then  ^->X  is  in  Drv(sj).  But  dj+X  will  be  con¬ 
stant  in  s^  even  if  the  VEQ  is  not  derivable  in  Sj,  since  cross 
products  do  not  filter  out  any  tuples. 

To  prove  (i),  note  that  rule  (2)  of  Cl  (transitivity)  is  the 
only  one  which  can  generate  the  indicated  EQ.  So  we  use  induc¬ 
tion  on  the  number  of  applications  of  rule  (2). 

There  are  only  two  possible  pairs  of  EQs  to  which  transi¬ 
tivity  can  be  applied  and  which  themselves  are  not  the  result  of 
transitivity.  One  pair  consists  of  dj+X-d^+X,  which  would  be 
the  result  of  rule  (8),  and  d^+X-d ,  which  would  be  the  result 
of  an  EO  X*Y  derivable  in  and  Sj.  From  x«Y  we  can  qet 
d i *Y*d  j ♦X ,  which  yields  dj+Y-d^+Y  derivable  in  r,  and  also 
dj*Y«dj*X,  which  yields  d j ♦X*d j+X  derivable  in  r.  Now  the  EQ 
X*Y  must,  in  fact,  be  derivable  in  R,  for  the  only  other  way  to 
get  an  EQ  in  a  selection  on  a  base  relation  is  to  derive  it  from 
VEQs  which  we  assumed  impossible. 

The  other  pair  of  EQs  is  like  the  one  above  with  the  roles  of 
X  and  Y  interchanged. 

Now  assume  the  hypotheses  hold  and  also  the  there  are  EOs 
di*x*’dk>z  and  d»t4Z“dj'fY  derivable  in  r.  First  assume  X?Z?Y. 

We  cannot  have  any  VEQs  dk*ZsV  since  this  would  yield  d^+XaV  and 
dj+YaV.  By  induction,  we  have  d^+Z*dj+Z  and  dj+Z^d^+Z  derivable 
in  r,  and  x-Z  and  Y*Z  derivable  in  R.  We  therefore  have  X«Y 
derivable  in  R  and  d^+Z»d^+?  derivable  in  r,  and  from  this  we 
get  dj*X*d^*x  and  dj*Y*d^+Y  derivable  in  r.  If  Z  is  X,  then  we 
have  dj>X*dfc*X  and  d^x»d^*Y  derivable  in  r.  By  induction  on 
the  second  EO,  we  have  d^+X^dj+X  and  d^+Y^d^+Y  derivable  in  r 
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and  X“Y  derivable  in  R.  From  this  we  derive  d^+X-dj+X  and 
d  j  -fY^d  j  +  Y  in  r.  If  Z  is  Y  we  proceed  analogously. 

To  prove  part  (ii),  first  assume  XaV  €  Drv(Sj).  Then, 
clearly,  ^->X  €  Drv(sj).  If  dj*X»d^Y  and  dj+YsV  are  in 
Drv(r)  and  i?j  and  Xi<Y,  then  by  (i)  dj+X-dj+X  €  Drv(r).  By 
(iii),  6-~>X  £  Drv(sj).  If  i»j  or  X  is  Y,  we  also  get  rf->X  € 

Drv (sj )  . 

To  prove  (iii),  we  have  that  if  dj+XsV  and  dj+X«V  are  in 
Drv(r),  then  ^->X  €  Drv  ( s  j )  flDrv  (s  j  )  by  (ii).  If  d^+x-d^  +  Y  and 
d^+Y-dj+X  are  in  Drv(r),  and  X*Y,  then  by  (i),  dj+x-d^+X  and 
d^+x-d^+x  are  in  Drv(r),  so  by  induction,  0~>X  6  Drv(Sj)  and 
4->X  €  Drv(Sj).  If  the  EQ  is  the  result  of  rule  (81,  then 
there  must  be  an  FD  Z->X  €  Drv ( R)  such  that  dj+Z»dj+Z  is  in 
Drv(r).  By  induction,  d->Z  €  Drv  ( s  j )  flDrv  ( s  j )  ,  and  therefore 
*->X  6  Drv  (s  ^ ) riDrv  (  s  j )  . 

This  proves  the  lemma. 

We  now  will  indicate  how  to  modify  stj.  We  may  suppose  that 
R ( st | )  -  RfXjiVj 1 (stj)  •  (tj,t2).  (If  not,  delete  the  extra 
tuples;  they  do  not  add  anything.)  First  let  u  be  a  1:1  function 
defined  on  values  appearing  in  tj  and  t2  such  that  its  image  is 

distinct  from  the  values  in  tj  and  t2  and  from  the  values 

ippearing  in  the  selection  terns  of  r.  Let  Ej,...,En  be  the 
equivalence  classes  of  the  domains  of  r  under  Associate  a 

value  Vj  with  each  Ej  as  follows:  (i)  Vj*V  if  X*V  <?  Drv(r)  for 
sone  X  «■  Ej;  (ii)  if  there  is  some  X  €  Ej  such  that  l<X<deg(R) 
(if  X  is  a  domain  of  s^),  then  Vj-uftjfXl);  (iii)  otherwise  Vj 
is  an  arbitrary  unique  value.  Define  R(st->)  -  (tj,t2)  as 

follows:  t j  (XI  *  Vk  if  X  «  Ek  and  is  assigned  by  a  VEQ; 


q  [X]  *  u(Cj(X])  ,  otherwise.  Also,  qiXl  *  Vk  if  X  €  Ek  and 
Vk  is  assigned  by  a  VEQ;  tjlX]  »  u(t2(X]),  otherwise. 

We  first  show  that  st2  is  a  state,  that  t|  and  tj,  appear 
in  Sj ,  and  that  st,  is  still  a  counterexample  state  for  Zj->A: 

If  XeV  is  a  component  of  Xj*Vj,  then  Vk  is  assigned  by  a  VEQ, 
where  X  €  £k  and  Vk  =  V.  Hence  t|(X)«tJ(X)«V.  Thus  R(st2) 
satisfies  the  VEQs  of  RfXjaq),  and  R  (  Xj  )  ( st  2 )  =  R(st2). 

For  the  other  properties  of  R(st2),  we  will  show  that 
q(X)«q(X]  if  and  only  if  q(X)-t7[Xl.  First,  it  is  clear 
fron  the  definition  that  qfX^tjlX]  implies  tjfxl^tqxl. 

Now  suppose  q  (Xl-tqxl  .  If  Vk  is  not  assiqned  by  a  VEQ , 
where  X  €  Ek,  then  tJfX]  *  u(qfX))  and  tJfX)  -  u(t2fX)), 
and  since  u  is  1:1,  we  get  q(Xl»t2{X]. 

Now  suppose  q  [Xl  -  Vk  *  q(XF,  where  X  e  Ek  and  Vk  is 
assigned  by  a  VEQ.  This  means  that  X*Vk  is  derivable  in  r.  By 
part  (ii)  of  the  lemma,  «*- >  X  is  in  Drv(Sj).  Since  q  ,  t2  € 
RlXjaVjl  (sq)  ,  we  have  q(Xl*t2(Xl.  This  proves  t^  tXl*t2rxl  if 
and  only  if  qfxl-qixi.  From  this  we  can  conclude  that  FDs 
and  DEQs  which  are  true  (false)  in  R(sq)  are  true  (false)  in 
R(st2).  Thus  st2  is  a  state  of  s,  and  Zj->A  is  false  in 
RtXj-V^  ( s 1 2 )  . 

The  next  step  is  to  show  that  we  can  add  tuples  to  R ( st  j)  to 
get  a  nonempty  state  for  every  other  selection  in  r. 

Consider  a  selection  term  Sj*RfXj»Vjl  (i»2,...,n)  in  r  of 
displacement  d^;  that  is,  the  domains  of  the  selection  are 
d  j  +  1  , . .  .  ,d  ^  >deg  (R)  .  define  a  tuple  q  by  qfXl«Vk  where  d^*X  6 


Ek.  By  construction,  if  tj  is  placed  in  R(st2),  then  tj  will 
appear  in  R(Xj*Vjl.  It  is  also  easy  to  see  that  tj  will  satisfy 
all  EQs  in  schema  s.  We  must  show  that  when  t2,...,tn  so  con¬ 
structed,  are  added  to  R(st2),  that  no  FDs  are  violated.  First 
we  show  that  there  will  be  no  FD  violations  among  t2,...,tm.  So 
suppose  R:W->B  is  in  s  and  t|(W)*tj[W).  Then  for  each  component 
Wk  of  W,  t j (W^ ) -t j 1 .  If  one  of  the  values,  say  Vw,  was 
assigned  by  clause  (i)  or  (iii)  of  the  definition  of  (Ej),  then 
both  d^*Wk  and  dk*wk  are  in  Ew  since  this  Vw  is  distinct  from 
all  other  values  associated  with  E-sets.  If  one  of  the  values 
was  assigned  by  the  second  clause,  then  both  were,  and  we  have, 
for  some  Uj  and  U2,  1<U j ,U2<deg (R) ,  dj+Wk»Uj  and  dj+Wk«U2  deriv¬ 
able.  If  Uj  is  wk ,  we  have  dj+W^aw^  derivable  in  r;  if  not, 
then  part  (i)  of  the  lemma  will  give  dj+Wk*Wk  derivable  in  r. 
Similarly,  dj  +  Wk=*wk  is  derivable  in  r.  Hence  d  j  *Wk  »d  j  ♦W((  is 
derivable  in  r.  The  component  k  was  arbitrary,  so  we  may  write: 
dj+Wadj+W  is  derivable  in  r.  Now  dj+W->dj*B  and  d^+W->dj+B  are 
also  derivable  in  r  and  have  the  sane  identifier.  We  therefore 
have  dj*B*dj*B  6  Drv(r),  and  therefore  tj(Bl*tj(B)  since  dj  +  B 
and  dj+B  are  in  the  same  E-set. 

Now  we  will  show  that  there  will  be  no  FO  violations  between 
t|  or  t2  and  any  of  the  tj,  i*2,...,m.  So  suppose  R:W->B  is 
in  s  and  t'fw)*tj{wl  (where  t'  is  tj  or  t2).  For  each  com¬ 
ponent  Wj  of  W,  t ' (Wj 1 «t j fw j 1 .  First  suppose  Vk  is  assigned  by 
a  VE9»  or  by  the  third  clause,  where  dj*Wj  G  Ek.  The  values 
assigned  to  these  E-sets  are  unique,  so  we  also  have  O+W^  G  Ek, 
i.e.,  W  j  *d  j  +w 1  is  derivable  in  r.  Next  suppose  Vk  was  assigned 
by  the  second  clause:  There  is  an  X  6  Ek  with  l<X<deq(R).  We 


have  d^+Wj-X  derivable,  but  from  the  lemma,  we  also  get  d^+Wj^Wj 
derivable.  Collectively,  we  have  W-dj+W  derivable  in  r.  Since 
W->B  and  dj+w->dj+B  have  the  same  identifier,  we  get  B-dj+B  in 
Drv(r).  If  these  domains  (B  and  dj  +  B)  are  in  an  E-set  Ek  whose 
value  is  assigned  by  a  VEQ ,  then  t ' ( B 1 -t j ( B ] «Vk.  Otherwise,  Vk 
is  equal  to  u(tj(X)),  where  l<X<deg(R)  and  X  €  Ek.  Since  X»B 
will  be  in  Drv(r),  it  is  also  in  Drv (RfXjsVj ] )  and  if  t*  is 
t|,  then  we  know  t j (B) «u (tj (X) ) -u(tj [B] ) «t J [B] .  If  t'  is 
t^,  then  from  B-dj+B  we  can  conclude  that  rf->B  is  in 
DrviRIXj-Vjl ) ,  and  this  also  yields  tj (B1 «tj [ B) *»t^lB] .  Thus 
no  FDs  are  violated,  and  we  may  define  a  state  st^  by  R(st-j)  ■ 

R ( s 1 2 )  (J  {t2,...»tnl.  This  state  will  have  the  property  that 
r(stj)^  and  Z^->A  is  false  in  RfXjeVjl  (st^)  . 

We  still  may  have  e(st3)«#  because  selections  on  other 
relations  may  be  empty.  However,  by  a  process  similar  to  the 
one  above,  we  may  add  tuples  to  get  a  nonempty  cross  product. 

The  case  for  EQs  which  are  not  derivable  in  the  cross  product 
can  be  handled  in  an  analogous  fashion. 

This  completes  the  case  for  cross  product. 

Now  suppose  that  e  is  a  union  e|U®2  an<1  that  Zj->A  $ 
Drv(e1|J*2^  for  «v®rY  zi  C  Z.  In  particular,  we  have  Z->A  0 
Drv(ej|Je2)*  Suppose  Z->A  is  not  derivable  in  e^.  By  induction 
there  is  a  state  st^  such  that  Z->A  is  false  in  e^(st^).  Then 
Z->A  will  also  be  false  in  (ejlJej)  (st^)  .  If  Z->A  is  not  deriv¬ 
able  in  e2,  we  proceed  in  a  similar  manner.  In  the  remaining 
case  (idjj:Z->A  e  Drv(ej),  { i d  2  j  f : 2— > A  e  Drv(e2)  but 
“ Eqv ( i d j  j , i d p j )  for  some  i,j.  By  Theorem  2,  there  is  a  state  st 


and  a  valuation  x  such  that  id ^  j  (st  j  x)  j< id2 ^  ( st;  x )  ,  and  by 
Theorem  3  there  are  tuples  t^  €  e^(st)  and  t2  6  e2(st)  such  that 
1 1 J  Z 1 -t  2 l Z1  but  t  j [ A] ^t  2 ( A] .  Thus  we  will  have  tj,  t2  6 
(elUe2^st^»  an<1  these  tuples  will  contradict  the  FD  Z->A. 

If  an  EQ  is  not  derivable  in  e^L^,  it  is  not  derivable  in 
either  ej  or  e2.  We  construct  a  counterexample  state  by  induc¬ 
tion  on  the  appropriate  component  (e^  or  e2) ,  and  this  will  also 
also  be  a  counterexample  state  for  the  union. 

In  the  last  case,  e  is  a  projection  ejjXl.  If  c  is  any  con¬ 
straint  not  derivable  in  e,  then  c^  will  not  be  derivable  in  e1# 
where  c^  is  c  with  each  domain  Y  replaced  by  XfYJ  .  By  induc¬ 
tion,  we  construct  a  counterexample  state  for  Cj  in  e^  and  this 
will  also  be  a  counterexample  state  for  c  in  e.  □ 

The  theorem  demonstrating  the  properties  of  G  needs  the  fol¬ 
lowing  lemma  which  says  that  moving  operators  around  does  not 
change  the  set  of  derived  constraints. 

Lemma .  Let  s  be  a  schema,  and  let  e,  ej  and  e->  be  expressions 
over  s.  Then 


(i) 

Drv(efX-Y) (ZsVl )  « 

>  Drv (e f ZaVl fx«Yl ) 

(ii) 

Drv(  (ejXe2)  (XaVl ) 

-  Drv(ex fXsVl  xe2)  ,  if  X<deg(R) 

(i  i  i) 

Drv  (  (e,xe2)  [XaVl ) 

-  Orv^x  (e2rx«Vl )  )  ,  if  x>deg  ( R ) 

(iv) 

Orv( (e1Ue2) (XI)  « 

Drv(e1(XlU®2tXl) 

Proof.  (i)  Fron  the  definitions  we  have 

Drv(efX*Y) fZavn  -  Cl(Cl(Drv(el  U  X*Y)  U  Z»V)  and 


Drv (e [ ZsV] [X-Y] )  »  Cl (CX (Drv (e)  U  ZaV)  U  X-Y). 

It  is  not  hard  to  show  that  the  inclusion  Cl(Cl(Drv(e)  U  X=Y)  U 
ZaV)  C  Cl (Cl { (Drv (e)  U  ZsV)  U  X  =  Y) )  is  true  and  also  that  the 
reverse  inclusion  holds. 

(ii)  The  definitions  give  us 

Drv{ (ejXe2)  fX«V) )  *  Cl (Cl (Drv (ej )  U  Drv'(e2))  U  XaV)  and 
Drv(et  (x-V)  xe2)  -  Cl  (Cl  (Drv  (e  j )  (J  X-V)  U  Drv'(«2)). 

Again  we  can  show  equality  by  showing  inclusion  in  both  direc- 
t i ons . 

Part  (iii)  is  analogous. 

(iv)  The  formulas  give  us 

Drv(  (ejL|e2)  fXD  -  {id2:Z->A  :  id  l :  X  f  Z ! ->X  f  A 1  €  Drv(e1L|e2))  (J 

(Y-Z  :  X [Yl -X { Z]  6  Drv(e1L|e?)}  U 

{YaV  :  XfYlaV  6  Drv(e1|J«2^'  where  idj  and  id 

are  as  in  the  definition,  and 

Drv  (e  ,  f  XlUe  >  ( Xl  )  *  {x  6  Drv  (e  ^  ( x] )  flDrv  (e2  ( XI )  :  x  is  an  EQ)  U 

(id1sZ->A  e  Drv(ejfxi)  s  there  is  id2:Z->A 
in  Drv(e2(x])  with  Rdc(idj)  *  Rdc(id2)). 

First  we  see  that 

(Y-Z  :  XfYl-XfZl  €  Drv(elU*2)l  U  (Y-V  :  X(Yl«V  €  DrvfejlJe  2*  * 

a 

(x  6  Drv (e 1 (XI )  rhrv (e2 (XI )  s  x  is  an  EO). 

Now  an  FD  id  1 :  X  f  Z) ->X  f  A1  is  in  DrvfejUe^)  if  *nd  only  if 
id  j  :X  f  Zl  -">X  f  M  c  r)rv(e^)  and  there  is  id  2  :  X  f  Z] ->X  [  Al  in  Drv(e2) 
with  Rdc ( id j ) -Rdc ( id 2) .  This  is  true  if  and  only  if  idj:Z->A 


€  Drv ( e , [ X] )  and  id 


Z-> A  €  Drv(e2[x]),  where  id 


obtained  from  id,  (id0)  by  adding  an  arc  to  each  leaf  node  and 


to  the  root  node.  Now  Rdc ( id p =Rdc ( id ^ ) *Rdc ( id2) *Rdc ( idp 
and  so  the  condition  is  equivalent  to  id!:Z->A  € 


Drv ( ej ( X) U® 2 ^ X ^ *  Th*s  shows  that  the  two  sets  of  FDs  are  the 


Theorem  11.  Let  s  be  a  schema,  and  let  e  be  an  expression  over 


s  of  syntax  III.  Then  Drv (G (e) ) «Drv (e)  ,  and  for  every  structure 


str,  G(e)(str)  C  e(str) 


Proof.  We  first  note  that  for  any  expressions  ei#e2,e 


e,(str)  C  e2(str),  then  e,(XaV](str)  C  e2[XaV](str) 


(ejXe3)  (str)  C  (e2Xe3)  (str)  ,  (e^xep  (str)  C  (ejXe2)  (str)  , 
ejlX-YMstr)  C  e2(x»Y)  (str)  ,  (e^ejMstr)  C  (e2Ue3)(str)  and 
e,(x](str)  C  e2(X)(str).  In  other  words,  the  set- i ncl us  ion  con 


dition  is  preserved  by  all  the  relational  algebra  operators 
(except,  of  course,  set  difference). 


Now  suppose  e  has  the  form  r^fX^lU 
is  a  restriciton  of  a  cross  product. 


where  each  r 


r_}  which 


contains  a  restriction  X*Y.  As  in  the  definition 


r*  is  r«  with 


X=Y  removed.  We  treat  each  of  the  three  cases  in  the  defini 


t  ion 


In  the  first  case,  since  XaV  €  Drv(r#),  we  have  X*Y  € 


Drv(rMYtV) )  .  Therefore  Drv(rj)  C  Drv  ( r*  ( YaVl )  .  On  the  other 
hand,  since  XaV  and  x*Y  are  in  Drv(rp»  we  have  Drv(rMYaVl)  C 
Drv(rj).  Therefore  Drv(rMYaVl)  *  Drv(rj).  This  implies  that 
Drv  ( r  ^  ( Xj  )(J.  •  .Uf  j  f  UJ.  •  •Urnf  Xnl )  » 

Drv  ( r  i  (Xj  lU*  •  -Ur*  [  YaVlfXj  UJ- •  *Llrn  tXnl )  .  Moving  the  selection 


*5 


term  YaV  inside  to  its  base  relation  gives  us,  by  the  previous 
lemma  Drv  (  r  j  [  x  j  1U  •  •  *Ur  {  t  X  t  ]  U« . .  L|f  n  l  Xn) )  »  where  rj  has  one 
less  restriction  than  rj. 

It  is  easy  to  see  that  rMYaVl(str)  C  r^istr)  for  all  struc¬ 
tures  str.  (In  fact,  r*(YaVl  ■  r*.) 

The  second  case  is  analogous  to  the  first  case. 

In  the  third  case,  r^  is  to  be  replaced  by  r'  * 
r*  ( XsVj  ]( YsVj  ]  Ur*  (  XsV2l  { YsV2)  #  where  V1»<V2  are  new  values. 

The  EQ  X*Y  is  derivable  in  both  terms  of  this  union,  so  Drv(rj) 

C  Drv(r').  Any  VEQ  ZaV  in  Drv(r')  must  be  derivable  in  both 
r*(XaV1]  [YsVj]  and  r*  [  X-VjHYsVj)  .  This  means  that  and 

V^V 2 ,  and  therefore  ZaV  is  derivable  in  r*.  From  this  we  con¬ 
clude  that  ZaV  e  Drv(rj).  Suppose  an  EQ  Z»W  is  derivable  in 
both  r* ( XaVj ] ( YaVj ]  and  r * [XsV2]f YaV2l .  Then  it  is  either 
derivable  in  r*,  or  it  follows  from  XaV1 ,  YaVj  or  XaV2,  YbV2  via 
the  EQ  X-Y.  In  both  cases  it  is  derivable  in  rj. 

Now  suppose  an  FD  Z->A  is  derivable  in  r'.  Since  it  must 
have  equivalent  identifiers  in  both  r* [XaV^ ] (YaV^ ]  and  in 
r* ( XbV2] (YaV2) ,  the  identifiers  must  not  have  any  nodes  labelled 
with  either  Vj  or  V2.  This  means  no  FDs  ^->X  or  6-> Y  whose 
identifiers  would  be  ,V1'->X,  *V2'->X,  However,  the  FDs  X->Y 
and  Y->X ,  whose  identifiers  are  themselves  can  be  used,  but  if 
so,  it  is  because  the  FD  was  derived  from  the  EQ  X«Y  (unless 
X->Y  or  Y->X  €  Drv(r*)  which  is  still  fine.)  Thus  Z->A  will  also 
be  derivable  in  rj.  We  have  therefore  shown  that  Drv(r')  C 
Drv ( r  j ) .  Let  r|  -  r*(XaV1) [YaV2l  and  rj  -  r* [XaV21[YaV2l . 

Then  by  the  previous  lemma,  we  have 
Drv(r1(X1lU...UrlfXilU...Urn(Xnl)  - 


Drv(r1[X1)U...Ur{(XllUrJ[XilU-.-Urn(Xnl) .  We  also  note  that 
if  t  €  rMXsVj]  (YsVj](str)  ,  then  t(X)»t(Y],  and  so  t  € 
r*[x*Y](str)  *  r^(str).  Similarly,  we  get  r* [XsV2l [YbV2J (str)  C 
rj(str).  Thus  r}(str)  C  r^(str). 

We  have  shown  for  every  step  in  the  construction  of  G(e), 
that  the  Drv-set  does  not  change,  and  that  structures  of  the  new 
expression  are  subsets  of  structures  of  the  original  expression 
e.  Therefore  Drv (G (e) ) «Drv  (e)  ,  and  for  all  structures  str, 

G(e' (str)  C  e (str) .  □ 

Theorem  12.  Let  s  be  a  schema,  and  let  e  be  an  expression  over 
s  which  is  consistent  and  of  syntax  III.  Then  any  valid  EQ  on  e 
is  a  member  of  Drv(e),  and  if  Z->A  is  valid  on  e,  then  for  some 
Zl  C  Z,  Zj->A  6  Drv(e) . 

Proof.  Suppose  e  is  consistent  and  c  0  Drv(e).  Since  e  is 
consistent,  G(e)  is  consistent.  If  c  is  an  EO  not  in 
Drv (e) -Drv(G(e) ) ,  or  if  it  is  an  FD  Z->A  such  that  Zj->A  0 
Drv(e)»Drv(G(e) )  for  every  Zj  C  Z,  then  by  Theorem  10,  there  is 
a  state  st  such  that  c  is  false  in  G(e)(st).  Since  G(e)(st)  C 


e(st),  c  is  also  false  in  e(st).  □ 
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Multiple  Views,  Relational  Algebra 
functional  Dependencies,  Completeness 


A  desirable  feature  of  a  database  management  system  is  the  ability  to  support 
many  views  of  the  database  via  several  user  models.  In  order  to  provide  this 
support  while  allowing  the  user  to  believe  that  his/her  view  and  data  model  are 
the  only  ones,  the  database  system  must  have  a  number  of  facilities.  One  of  the 
most  important  of  these  is  a  mechanism  to  tell  when  view  constraints  will  be  sat 
isfied  given  that  the  underlying  database  constraints  are  satisfied  so  that  the 
user  always  sees  what  is  expected. 
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Abstract  -  continued 


This  paper  deals  with  a  particular  instance  of  this  problem  where  the  con¬ 
straints  are  functional  dependencies  and  the  views  are  created  through  rela¬ 
tional  algebra  expressions.  The  problem  immediately  reduces  to  the  problem 
of  calculating  all  valid  functional  dependencies  (and  other  constraints)  on 
a  relational  algebra  expression  over  relations  in  the  base  schema.  The 
problem  is  undecidable  in  general  but  we  give  a  sound  and  conplete  algorithm 
when  set  difference  is  omitted  from  relational  algebra. 


